### Example sense2vec.eval Usage

Source: https://github.com/explosion/sense2vec/blob/master/README.md

An example of how to run the sense2vec.eval recipe with specific senses and a similarity threshold.

```bash
prodigy sense2vec.eval vectors_eval /path/to/s2v_reddit_2015_md
--senses NOUN,ORG,PRODUCT --threshold 0.5
```

--------------------------------

### Install and Run Streamlit Sense2Vec Demo

Source: https://github.com/explosion/sense2vec/blob/master/README.md

Install the Streamlit library and run the provided demo script to explore pretrained sense2vec vectors. The script requires paths to pretrained vectors as command-line arguments.

```bash
pip install streamlit
streamlit run https://raw.githubusercontent.com/explosion/sense2vec/master/scripts/streamlit_sense2vec.py /path/to/vectors
```

--------------------------------

### Install sense2vec

Source: https://github.com/explosion/sense2vec/blob/master/README.md

Install the sense2vec library using pip. This command fetches and installs the package from the Python Package Index.

```bash
pip install sense2vec
```

--------------------------------

### Run sense2vec.teach Recipe

Source: https://github.com/explosion/sense2vec/blob/master/README.md

Use this recipe to bootstrap a terminology list. Prodigy suggests similar terms based on sense2vec vectors, adjusting suggestions as you annotate. Ensure sense2vec is installed in the same environment as Prodigy.

```bash
prodigy sense2vec.teach tech_phrases /path/to/s2v_reddit_2015_md
--seeds "natural language processing, machine learning, artificial intelligence"
```

--------------------------------

### Initialize Sense2Vec with senses

Source: https://github.com/explosion/sense2vec/blob/master/README.md

When initializing Sense2Vec, you can specify the available senses. This example shows how to check if a sense is present after initialization.

```python
s2v = Sense2Vec(senses=["VERB", "NOUN"])
assert "VERB" in s2v.senses
```

--------------------------------

### Initialize and Use Sense2Vec Vectors

Source: https://context7.com/explosion/sense2vec/llms.txt

Demonstrates initializing Sense2Vec with pretrained vectors, checking for keys, retrieving vectors and frequencies, finding most similar terms, calculating similarity between terms, and getting other senses of a word.

```python
from sense2vec import Sense2Vec
import numpy as np

# Initialize and load pretrained vectors
s2v = Sense2Vec().from_disk("/path/to/s2v_reddit_2015_md")

# Check if key exists and get vector
query = "natural_language_processing|NOUN"
if query in s2v:
    vector = s2v[query]  # Returns numpy.ndarray
    freq = s2v.get_freq(query)  # Returns frequency count
    print(f"Vector shape: {vector.shape}, Frequency: {freq}")

# Find most similar terms
most_similar = s2v.most_similar(query, n=5)
for key, score in most_similar:
    print(f"{key}: {score:.4f}")
# Output:
# machine_learning|NOUN: 0.8987
# computer_vision|NOUN: 0.8636
# deep_learning|NOUN: 0.8573
# artificial_intelligence|NOUN: 0.8321
# data_mining|NOUN: 0.8156

# Calculate similarity between terms
sim = s2v.similarity("python|NOUN", "javascript|NOUN")
print(f"Similarity: {sim:.4f}")

# Find other senses of the same word
other_senses = s2v.get_other_senses("duck|NOUN")
print(other_senses)  # ['duck|VERB', 'Duck|ORG', 'Duck|PERSON']

# Get best sense for ambiguous word
best = s2v.get_best_sense("apple")  # Returns highest frequency sense
print(best)  # "apple|NOUN" or "Apple|ORG" depending on corpus
```

--------------------------------

### Programmatic Usage of sense2vec.teach Logic

Source: https://context7.com/explosion/sense2vec/llms.txt

Demonstrates the internal logic of the sense2vec.teach recipe for finding seed keys, getting suggestions, and filtering them by a similarity threshold.

```python
# Programmatic usage (internal structure)
from sense2vec import Sense2Vec

s2v = Sense2Vec().from_disk("/path/to/vectors")

# Find seed keys
seeds = ["machine learning", "deep learning"]
seed_keys = []
for seed in seeds:
    key = s2v.get_best_sense(seed)
    if key:
        seed_keys.append(key)
        print(f"Seed: {seed} -> {key}")

# Get suggestions above threshold
threshold = 0.85
suggestions = s2v.most_similar(seed_keys, n=100)
filtered = [(key, score) for key, score in suggestions if score > threshold]

for key, score in filtered[:10]:
    word, sense = s2v.split_key(key)
    print(f"{word} ({sense}): {score:.4f}")
```

--------------------------------

### Get all keys from Sense2Vec

Source: https://github.com/explosion/sense2vec/blob/master/README.md

Convert the keys iterator returned by `Sense2Vec.keys()` into a list to get all string keys present in the table.

```python
all_keys = list(s2v.keys())
```

--------------------------------

### Sense2Vec Patterns Output Format

Source: https://context7.com/explosion/sense2vec/llms.txt

Example of the JSONL output format generated by the sense2vec.to-patterns recipe, suitable for use with spaCy's EntityRuler.

```json
# Output format (patterns.jsonl):
# {"label": "TECHNOLOGY", "pattern": [{"lower": "machine"}, {"lower": "learning"}]}
# {"label": "TECHNOLOGY", "pattern": [{"lower": "neural"}, {"lower": "network"}]}
# {"label": "TECHNOLOGY", "pattern": [{"lower": "deep"}, {"lower": "learning"}]}
```

--------------------------------

### Sense2Vec Get Other Senses

Source: https://github.com/explosion/sense2vec/blob/master/README.md

Find other entries for the same word but with a different sense.

```APIDOC
## Sense2Vec.get_other_senses

### Description
Find other entries for the same word with a different sense, e.g. "duck|VERB" for "duck|NOUN".

### Method
`get_other_senses`

### Endpoint
N/A (Instance method)

### Parameters
- **key** (unicode / int) - The key to check.
- **ignore_case** (bool) - Check for uppercase, lowercase and titlecase. Defaults to `True`.

### RETURNS
- **list** - The string keys of other entries with different senses.

### Request Example
```python
other_senses = s2v.get_other_senses("duck|NOUN")
# ['duck|VERB', 'Duck|ORG', 'Duck|VERB', 'Duck|PERSON', 'Duck|ADJ']
```
```

--------------------------------

### Sense2Vec Get Item

Source: https://github.com/explosion/sense2vec/blob/master/README.md

Retrieves a vector for a given key. Returns None if the key is not found.

```APIDOC
## Sense2Vec.__getitem__

### Description
Retrieve a vector for a given key. Returns None if the key is not in the table.

### Method
`__getitem__`

### Parameters
#### Path Parameters
None

#### Query Parameters
None

#### Request Body
- **key** (unicode / int) - Required - The key to look up.

### Request Example
```python
vec = s2v["avocado|NOUN"]
```

### Response
#### Success Response (200)
- **vector** (`numpy.ndarray`) - The vector or `None`.

#### Response Example
```json
{
  "vector": [4.0, 2.0, 2.0, 2.0]
}
```
```

--------------------------------

### sense2vec.eval

Source: https://github.com/explosion/sense2vec/blob/master/README.md

Evaluate a sense2vec model by asking about phrase triples: is word A more similar to word B, or to word C? The recipe will only ask about vectors with the same sense and supports different example selection strategies.

```APIDOC
## sense2vec.eval

### Description
Evaluate a sense2vec model by asking about phrase triples: is word A more similar to word B, or to word C? If the human mostly agrees with the model, the vectors model is good. The recipe will only ask about vectors with the same sense and supports different example selection strategies.

### Method
PRODIGY COMMAND

### Endpoint
`sense2vec.eval`

### Parameters
#### Positional Arguments
- **dataset** (string) - Required - Dataset to save annotations to.
- **vectors_path** (string) - Required - Path to pretrained sense2vec vectors.

#### Options
- **--strategy** (`-st`) (string) - Optional - Example selection strategy. `most similar` (default) or `random`.
- **--senses** (`-s`) (string) - Optional - Comma-separated list of senses to limit the selection to. If not set, all senses in the vectors will be used.
- **--exclude-senses** (`-es`) (string) - Optional - Comma-separated list of senses to exclude. See `prodigy_recipes.EVAL_EXCLUDE_SENSES` for the defaults.
- **--n-freq** (`-f`) (integer) - Optional - Number of most frequent entries to limit to.
- **--threshold** (`-t`) (float) - Optional - Minimum similarity threshold to consider examples.
- **--batch-size** (`-b`) (integer) - Optional - Batch size to use.
- **--eval-whole** (`-E`) (flag) - Optional - Evaluate the whole dataset instead of the current session.
- **--eval-only** (`-O`) (flag) - Optional - Don't annotate, only evaluate the current dataset.
- **--show-scores** (`-S`) (flag) - Optional - Show all scores for debugging.

### Strategies
#### `most_similar`
Pick a random word from a random sense and get its most similar entries of the same sense. Ask about the similarity to the last and middle entry from that selection.

#### `most_least_similar`
Pick a random word from a random sense and get the least similar entry from its most similar entries, and then the last most similar entry of that.

#### `random`
Pick a random sample of 3 words from the same random sense.

### Example
```bash
prodigy sense2vec.eval vectors_eval /path/to/s2v_reddit_2015_md --senses NOUN,ORG,PRODUCT --threshold 0.5
```
```

--------------------------------

### Get Best Sense for a Word

Source: https://github.com/explosion/sense2vec/blob/master/README.md

Find the best-matching sense for a given word. Optionally limit the search to specific senses. Case-insensitive matching is enabled by default.

```python
assert s2v.get_best_sense("duck") == "duck|NOUN"
```

```python
assert s2v.get_best_sense("duck", ["VERB", "ADJ"]) == "duck|VERB"
```

--------------------------------

### Integrate Sense2Vec as a spaCy Pipeline Component

Source: https://github.com/explosion/sense2vec/blob/master/README.md

Add sense2vec as a pipeline component to a spaCy model and access word vector information via extension attributes. Requires spaCy v3 and the sense2vec library installed.

```python
import spacy

nlp = spacy.load("en_core_web_sm")
s2v = nlp.add_pipe("sense2vec")
s2v.from_disk("/path/to/s2v_reddit_2015_md")

doc = nlp("A sentence about natural language processing.")
assert doc[3:6].text == "natural language processing"
freq = doc[3:6]._.s2v_freq
vector = doc[3:6]._.s2v_vec
most_similar = doc[3:6]._.s2v_most_similar(3)
# [(('machine learning', 'NOUN'), 0.8986967),
#  (('computer vision', 'NOUN'), 0.8636297),
#  (('deep learning', 'NOUN'), 0.8573361)]
```

--------------------------------

### Run Script with Help Option

Source: https://github.com/explosion/sense2vec/blob/master/README.md

Use the --help flag to view command-line arguments for any script. This is useful for understanding script options and parameters.

```bash
python scripts/01_parse.py --help
```

--------------------------------

### Sense2Vec Get Frequency

Source: https://github.com/explosion/sense2vec/blob/master/README.md

Retrieves the frequency count for a given key, with an optional default value.

```APIDOC
## Sense2Vec.get_freq

### Description
Get the frequency count for a given key.

### Method
`get_freq`

### Parameters
#### Path Parameters
None

#### Query Parameters
None

#### Request Body
- **key** (unicode / int) - Required - The key to look up.
- **default** - Optional - Default value to return if no frequency is found.

### Request Example
```python
vec = s2v["avocado|NOUN"]
s2v.add("🥑|NOUN", vec, 1234)
assert s2v.get_freq("🥑|NOUN") == 1234
```

### Response
#### Success Response (200)
- **frequency** (int) - The frequency count.

#### Response Example
```json
{
  "frequency": 1234
}
```
```

--------------------------------

### Prodigy Recipe: sense2vec.teach for Terminology Bootstrapping

Source: https://context7.com/explosion/sense2vec/llms.txt

Bootstrap terminology lists by suggesting similar terms based on seed phrases. Use the command line for basic and advanced usage, including resuming interrupted processes.

```bash
# Basic usage with seed terms
prodigy sense2vec.teach tech_terms /path/to/s2v_reddit_2015_md \
    --seeds "machine learning, deep learning, neural network" \
    --threshold 0.85

# With additional options
prodigy sense2vec.teach medical_terms /path/to/vectors \
    --seeds "diabetes, hypertension, cardiovascular disease" \
    --threshold 0.80 \
    --n-similar 200 \
    --batch-size 10 \
    --resume  # Continue from existing dataset
```

--------------------------------

### Sense2VecComponent.from_disk

Source: https://github.com/explosion/sense2vec/blob/master/README.md

Loads a Sense2VecComponent from a directory.

```APIDOC
## Sense2VecComponent.from_disk

### Description
Load a `Sense2Vec` object from a directory. Also called when you run `nlp.from_disk`.

### Method
`from_disk`

### Parameters
#### Path Parameters
None

#### Query Parameters
None

#### Request Body
None

### Parameters
#### Path Parameters
- **path** (unicode / `Path`) - Required - The path to load from.

### Request Example
```python
loaded_component = Sense2VecComponent.from_disk("/path/to/model")
```

### Response
#### Success Response (200)
- **Sense2VecComponent** (Sense2VecComponent) - The loaded object.

#### Response Example
```json
{
  "example": "Sense2VecComponent object"
}
```
```

--------------------------------

### Get Frequency of Sense2Vec Key

Source: https://github.com/explosion/sense2vec/blob/master/README.md

Retrieve the frequency count for a given key. A default value can be provided if the key is not found.

```python
vec = s2v["avocado|NOUN"]
s2v.add("🥑|NOUN", vec, 1234)
assert s2v.get_freq("🥑|NOUN") == 1234
```

--------------------------------

### Get Sense2Vec Vector Count

Source: https://github.com/explosion/sense2vec/blob/master/README.md

Retrieve the number of rows in the vectors table. Asserts the length matches the specified shape.

```python
s2v = Sense2Vec(shape=(300, 128))
assert len(s2v) == 300
```

--------------------------------

### Sense2Vec Serialization - Save and Load Models

Source: https://context7.com/explosion/sense2vec/llms.txt

Illustrates how to save and load Sense2Vec models using `to_disk`, `from_disk`, `to_bytes`, and `from_bytes`. Also shows how to exclude specific fields during disk serialization.

```python
from sense2vec import Sense2Vec
import numpy as np

# Create and populate model
s2v = Sense2Vec(shape=(100, 64), senses=["NOUN", "VERB", "ADJ"])
for i in range(50):
    vec = np.random.rand(64).astype(np.float32)
    s2v.add(f"word_{i}|NOUN", vec, freq=i * 10)

# Save to directory
s2v.to_disk("/path/to/my_vectors")

# Load from directory
loaded_s2v = Sense2Vec().from_disk("/path/to/my_vectors")
assert len(loaded_s2v) == len(s2v)

# Serialize to bytes (useful for network transfer)
bytes_data = s2v.to_bytes()
restored_s2v = Sense2Vec().from_bytes(bytes_data)

# Exclude specific fields during serialization
s2v.to_disk("/path/to/vectors_no_cache", exclude=["cache", "strings"])
```

--------------------------------

### Get other senses for a word with Sense2Vec

Source: https://github.com/explosion/sense2vec/blob/master/README.md

The `Sense2Vec.get_other_senses` method finds entries for the same word but with different senses. By default, it ignores case when searching.

```python
other_senses = s2v.get_other_senses("duck|NOUN")
# ['duck|VERB', 'Duck|ORG', 'Duck|VERB', 'Duck|PERSON', 'Duck|ADJ']
```

--------------------------------

### Get all vectors from Sense2Vec

Source: https://github.com/explosion/sense2vec/blob/master/README.md

Convert the vectors iterator returned by `Sense2Vec.values()` into a list to retrieve all numpy ndarray vectors stored in the table.

```python
all_vecs = list(s2v.values())
```

--------------------------------

### Load pretrained vectors with Sense2Vec

Source: https://github.com/explosion/sense2vec/blob/master/README.md

Initialize Sense2Vec and load pretrained vectors from a specified directory. Ensure the directory contains the unpacked vector data.

```python
from sense2vec import Sense2Vec
s2v = Sense2Vec().from_disk("/path/to/s2v_reddit_2015_md")
```

--------------------------------

### Sense2VecComponent.__init__

Source: https://github.com/explosion/sense2vec/blob/master/README.md

Initializes the Sense2VecComponent with a vocabulary, vector shape, and configuration for phrase merging and lemmatization.

```APIDOC
## Sense2VecComponent.__init__

### Description
Initialize the pipeline component.

### Method
`__init__`

### Parameters
#### Path Parameters
None

#### Query Parameters
None

#### Request Body
- **vocab** (Vocab) - Required - The shared `Vocab`. Mostly used for the shared `StringStore`.
- **shape** (tuple) - Required - The vector shape.
- **merge_phrases** (bool) - Optional - Whether to merge sense2vec phrases into one token. Defaults to `False`.
- **lemmatize** (bool) - Optional - Always look up lemmas if available in the vectors, otherwise default to original word. Defaults to `False`.
- **overrides** (Optional) - Optional custom functions to use, mapped to names registered via the registry, e.g. `{"make_key": "custom_make_key"}`.

### Request Example
```python
s2v = Sense2VecComponent(nlp.vocab)
```

### Response
#### Success Response (200)
- **Sense2VecComponent** (Sense2VecComponent) - The newly constructed object.

#### Response Example
```json
{
  "example": "Sense2VecComponent object"
}
```
```

--------------------------------

### Initialize Sense2VecComponent from NLP object

Source: https://github.com/explosion/sense2vec/blob/master/README.md

Initialize the component using an existing nlp object. This method is often used as a factory for the component entry point.

```python
s2v = Sense2VecComponent.from_nlp(nlp)
```

--------------------------------

### Serialize Sense2VecComponent to Disk

Source: https://github.com/explosion/sense2vec/blob/master/README.md

Serialize the component to a directory. This is also called when the component is added to the pipeline and nlp.to_disk is run.

```python
s2v.to_disk(path)
```

--------------------------------

### sense2vec.teach

Source: https://github.com/explosion/sense2vec/blob/master/README.md

Bootstrap a terminology list using sense2vec. Prodigy suggests similar terms based on sense2vec vectors, adjusting suggestions as you annotate.

```APIDOC
## sense2vec.teach

### Description
Bootstrap a terminology list using sense2vec. Prodigy will suggest similar terms based on the most similar phrases from sense2vec, and the suggestions will be adjusted as you annotate and accept similar phrases. For each seed term, the best matching sense according to the sense2vec vectors will be used.

### Method
PRODIGY COMMAND

### Endpoint
sense2vec.teach [dataset] [vectors_path] [--seeds] [--threshold] [--n-similar] [--batch-size] [--resume]

### Parameters
#### Positional Arguments
- **dataset** (positional) - Dataset to save annotations to.
- **vectors_path** (positional) - Path to pretrained sense2vec vectors.

#### Options
- **--seeds, -s** (option) - One or more comma-separated seed phrases.
- **--threshold, -t** (option) - Similarity threshold. Defaults to `0.85`.
- **--n-similar, -n** (option) - Number of similar items to get at once.
- **--batch-size, -b** (option) - Batch size for submitting annotations.
- **--resume, -R** (flag) - Resume from an existing phrases dataset.

### Request Example
```bash
prodigy sense2vec.teach tech_phrases /path/to/s2v_reddit_2015_md \
--seeds "natural language processing, machine learning, artificial intelligence"
```

### Response
#### Success Response (200)
- **Annotations** (list) - Saved annotations to the specified dataset.

#### Response Example
(No specific response example provided, output is saved to dataset)
```

--------------------------------

### Sense2VecComponent.to_disk

Source: https://github.com/explosion/sense2vec/blob/master/README.md

Serializes the Sense2VecComponent to a directory.

```APIDOC
## Sense2VecComponent.to_disk

### Description
Serialize the component to a directory. Also called when the component is added to the pipeline and you run `nlp.to_disk`.

### Method
`to_disk`

### Parameters
#### Path Parameters
None

#### Query Parameters
None

#### Request Body
None

### Parameters
#### Path Parameters
- **path** (unicode / `Path`) - Required - The path.

### Request Example
```python
nlp.to_disk("/path/to/model")
```

### Response
#### Success Response (200)
None

#### Response Example
None
```

--------------------------------

### Sense2Vec Initialization

Source: https://github.com/explosion/sense2vec/blob/master/README.md

Initializes a Sense2Vec object with specified parameters.

```APIDOC
## Sense2Vec.__init__

### Description
Initialize the `Sense2Vec` object.

### Method
`__init__`

### Parameters
#### Path Parameters
None

#### Query Parameters
None

#### Request Body
- **shape** (tuple) - Optional - The vector shape. Defaults to `(1000, 128)`.
- **strings** (`spacy.strings.StringStore`) - Optional - Optional string store. Will be created if it doesn't exist.
- **senses** (list) - Optional - Optional list of all available senses. Used in methods that generate the best sense or other senses.
- **vectors_name** (unicode) - Optional - Optional name to assign to the `Vectors` table, to prevent clashes. Defaults to `"sense2vec"`.
- **overrides** (dict) - Optional - Optional custom functions to use, mapped to names registered via the registry, e.g. `{"make_key": "custom_make_key"}`.

### Request Example
```python
s2v = Sense2Vec(shape=(300, 128), senses=["VERB", "NOUN"])
```

### Response
#### Success Response (200)
- **object** (`Sense2Vec`) - The newly constructed object.

#### Response Example
```json
{
  "message": "Sense2Vec object created successfully"
}
```
```

--------------------------------

### Load and Query Sense2Vec Model Standalone

Source: https://github.com/explosion/sense2vec/blob/master/README.md

Load a pretrained sense2vec model from disk and query for vectors, frequencies, and most similar terms. Ensure the model path is correct and the query term exists in the model.

```python
from sense2vec import Sense2Vec

s2v = Sense2Vec().from_disk("/path/to/s2v_reddit_2015_md")
query = "natural_language_processing|NOUN"
assert query in s2v
vector = s2v[query]
freq = s2v.get_freq(query)
most_similar = s2v.most_similar(query, n=3)
# [('machine_learning|NOUN', 0.8986967),
#  ('computer_vision|NOUN', 0.8636297),
#  ('deep_learning|NOUN', 0.8573361)]
```

--------------------------------

### Run sense2vec.eval Command

Source: https://github.com/explosion/sense2vec/blob/master/README.md

Use this command to evaluate a sense2vec model. Specify the dataset to save annotations, the path to the pretrained vectors, and optional strategies or filters.

```bash
prodigy sense2vec.eval [dataset] [vectors_path] [--strategy] [--senses]
[--exclude-senses] [--n-freq] [--threshold] [--batch-size] [--eval-whole]
[--eval-only] [--show-scores]
```

--------------------------------

### Sense2VecComponent.from_nlp

Source: https://github.com/explosion/sense2vec/blob/master/README.md

Initializes the Sense2VecComponent from an nlp object, commonly used as a component factory.

```APIDOC
## Sense2VecComponent.from_nlp

### Description
Initialize the component from an nlp object. Mostly used as the component factory for the entry point (see setup.cfg) and to auto-register via the `@spacy.component` decorator.

### Method
`from_nlp` (classmethod)

### Parameters
#### Path Parameters
None

#### Query Parameters
None

#### Request Body
- **nlp** (Language) - Required - The `nlp` object.
- **&&cfg** (-) - Optional - Optional config parameters.

### Request Example
```python
s2v = Sense2VecComponent.from_nlp(nlp)
```

### Response
#### Success Response (200)
- **Sense2VecComponent** (Sense2VecComponent) - The newly constructed object.

#### Response Example
```json
{
  "example": "Sense2VecComponent object"
}
```
```

--------------------------------

### Initialize Sense2Vec with Custom Overrides

Source: https://github.com/explosion/sense2vec/blob/master/README.md

When initializing Sense2Vec, pass a dictionary to the 'overrides' argument to use your custom registered functions for 'make_key' and 'split_key'.

```python
overrides = {"make_key": "custom", "split_key": "custom"}
s2v = Sense2Vec(overrides=overrides)
```

--------------------------------

### Train GloVe Vectors

Source: https://github.com/explosion/sense2vec/blob/master/README.md

This script uses the GloVe library to train word vectors. Ensure GloVe is cloned and built before running.

```bash
python scripts/04_glove_train_vectors.py
```

--------------------------------

### Compare two sense2vec models side-by-side

Source: https://context7.com/explosion/sense2vec/llms.txt

Utilize the sense2vec.eval-ab Prodigy recipe for A/B comparison of two vector models. Options include specifying senses, frequency, batch size, and debugging flags.

```bash
prodigy sense2vec.eval-ab comparison_results \
    /path/to/s2v_reddit_2015_md \
    /path/to/s2v_reddit_2019_lg \
    --senses NOUN,ORG,PRODUCT \
    --n-freq 100000 \
    --batch-size 5
```

```bash
prodigy sense2vec.eval-ab comparison /path/to/model_a /path/to/model_b \
    --show-mapping
```

```bash
prodigy sense2vec.eval-ab comparison /path/to/model_a /path/to/model_b \
    --eval-only --eval-whole
```

--------------------------------

### Prodigy Recipe: sense2vec.to-patterns for EntityRuler

Source: https://context7.com/explosion/sense2vec/llms.txt

Convert accepted phrases from sense2vec.teach into spaCy EntityRuler patterns. Supports basic usage, case-sensitive matching, and dry runs to preview patterns.

```bash
# Generate patterns for entity matching
prodigy sense2vec.to-patterns tech_terms en_core_web_sm TECHNOLOGY \
    --output-file ./patterns.jsonl

# Case-sensitive patterns
prodigy sense2vec.to-patterns brand_names en_core_web_sm BRAND \
    --output-file ./brand_patterns.jsonl \
    --case-sensitive

# Dry run to preview patterns
prodigy sense2vec.to-patterns medical_terms en_core_web_sm MEDICAL --dry
```

--------------------------------

### Train custom sense2vec vectors: Build vocabulary for GloVe

Source: https://context7.com/explosion/sense2vec/llms.txt

Step 3 of training custom vectors: Build word count statistics required for GloVe training. Specify input and output directories, and the path to the GloVe build scripts.

```bash
python scripts/03_glove_build_counts.py ./preprocessed/ ./glove/ \
    /path/to/GloVe/build/
```

--------------------------------

### Initialize Sense2VecComponent

Source: https://github.com/explosion/sense2vec/blob/master/README.md

Initialize the pipeline component with a shared Vocab object. This is the primary constructor for the Sense2VecComponent.

```python
s2v = Sense2VecComponent(nlp.vocab)
```

--------------------------------

### Find Similar Terms with Sense2Vec.most_similar

Source: https://context7.com/explosion/sense2vec/llms.txt

Demonstrates using the `most_similar` method to find terms with high cosine similarity to a single key or an average of multiple keys. Also shows how to access frequency rankings.

```python
from sense2vec import Sense2Vec

s2v = Sense2Vec().from_disk("/path/to/vectors")

# Single key query
similar = s2v.most_similar("machine_learning|NOUN", n=10)
for term, score in similar:
    word, sense = s2v.split_key(term)
    print(f"{word} ({sense}): {score:.4f}")

# Multiple keys - uses average vector
combined_similar = s2v.most_similar(
    ["artificial_intelligence|NOUN", "deep_learning|NOUN"],
    n=5,
    batch_size=32
)
print("\nSimilar to AI + Deep Learning combined:")
for term, score in combined_similar:
    print(f"  {term}: {score:.4f}")

# Using frequency rankings
top_terms = s2v.frequencies[:100]  # Most frequent (key, freq) tuples
for key, freq in top_terms[:5]:
    print(f"{key}: {freq} occurrences")
```

--------------------------------

### Sense2Vec Registry Customization

Source: https://github.com/explosion/sense2vec/blob/master/README.md

Demonstrates how to register custom functions for key generation and splitting within the Sense2Vec registry.

```APIDOC
## Sense2Vec Registry Customization

### Description
This section explains how to customize the functions used by Sense2Vec for generating and splitting keys, and how to apply these customizations when initializing the Sense2Vec model.

### Registry Functions
- `registry.make_key`: Given a `word` and `sense`, return a string of the key, e.g. `"word\|sense".`
- `registry.split_key`: Given a string key, return a `(word, sense)` tuple.
- `registry.make_spacy_key`: Given a spaCy object (`Token` or `Span`) and a boolean `prefer_ents` keyword argument, return a `(word, sense)` tuple.
- `registry.get_phrases`: Given a spaCy `Doc`, return a list of `Span` objects used for sense2vec phrases.
- `registry.merge_phrases`: Given a spaCy `Doc`, get all sense2vec phrases and merge them into single tokens.

### Registering Custom Functions
Use the `register` method as a decorator to add custom functions to the registry.

```python
from sense2vec import registry

@registry.make_key.register("custom")
def custom_make_key(word, sense):
    return f"{word}###{sense}"

@registry.split_key.register("custom")
def custom_split_key(key):
    word, sense = key.split("###")
    return word, sense
```

### Applying Customizations
Pass a dictionary of overrides to the `Sense2Vec` constructor to use your registered custom functions.

```python
overrides = {"make_key": "custom", "split_key": "custom"}
s2v = Sense2Vec(overrides=overrides)
```
```

--------------------------------

### Deserialize Sense2VecComponent from Disk

Source: https://github.com/explosion/sense2vec/blob/master/README.md

Load a Sense2Vec object from a directory. This is also called when nlp.from_disk is run.

```python
loaded_s2v = Sense2VecComponent.from_disk(path)
```

--------------------------------

### Train FastText Vectors

Source: https://github.com/explosion/sense2vec/blob/master/README.md

This script uses the FastText library to train word vectors. Ensure FastText is cloned and built before running.

```bash
python scripts/04_fasttext_train_vectors.py
```

--------------------------------

### Train custom sense2vec vectors: Preprocess to sense2vec format

Source: https://context7.com/explosion/sense2vec/llms.txt

Step 2 of training custom vectors: Convert parsed text into the sense2vec format. Specify the input directory containing parsed files and the output directory for preprocessed files.

```bash
python scripts/02_preprocess.py ./parsed/ ./preprocessed/
```

--------------------------------

### Initialize Sense2Vec Object

Source: https://github.com/explosion/sense2vec/blob/master/README.md

Initialize the Sense2Vec object with a specified shape and optional senses. Defaults are used if not provided.

```python
s2v = Sense2Vec(shape=(300, 128), senses=["VERB", "NOUN"])
```

--------------------------------

### Standalone Sense2Vec Usage

Source: https://github.com/explosion/sense2vec/blob/master/README.md

Instantiate the Sense2Vec class directly and load vectors using from_disk. Keys for lookup must follow the 'phrase_text|SENSE' format, and the table is case-sensitive.

```python
from sense2vec import Sense2Vec
s2v = Sense2Vec().from_disk("/path/to/reddit_vectors-1.1.0")
most_similar = s2v.most_similar("natural_language_processing|NOUN", n=10)
```

--------------------------------

### Evaluate sense2vec model quality

Source: https://context7.com/explosion/sense2vec/llms.txt

Use the sense2vec.eval Prodigy recipe to assess model quality by comparing phrase triples. Specify senses and a similarity threshold.

```bash
prodigy sense2vec.eval eval_results /path/to/vectors \
    --senses NOUN,VERB,ORG \
    --threshold 0.5
```

```bash
prodigy sense2vec.eval eval_data /path/to/vectors \
    --strategy most_similar \
    --n-freq 50000 \
    --batch-size 10
```

```bash
prodigy sense2vec.eval eval_data /path/to/vectors --eval-only
```

--------------------------------

### Customize Sense2Vec Key Functions with Registry

Source: https://context7.com/explosion/sense2vec/llms.txt

Register custom functions for creating and splitting keys using a different delimiter. This allows for flexible encoding schemes.

```python
from sense2vec import Sense2Vec, registry

# Register custom key format using ":::" instead of "|"
@registry.make_key.register("custom_format")
def custom_make_key(word, sense):
    return f"{word.replace(' ', '_')}:::{sense}"

@registry.split_key.register("custom_format")
def custom_split_key(key):
    if ":::" not in key:
        raise ValueError(f"Invalid key format: {key}")
    word, sense = key.rsplit(":::", 1)
    return word.replace("_", " "), sense

# Use custom functions with Sense2Vec
s2v = Sense2Vec(
    shape=(100, 64),
    overrides={"make_key": "custom_format", "split_key": "custom_format"}
)

# Keys now use custom format
import numpy as np
vec = np.random.rand(64).astype(np.float32)
s2v.add("hello_world:::NOUN", vec, freq=100)

# Verify custom format works
assert "hello_world:::NOUN" in s2v
word, sense = s2v.split_key("hello_world:::NOUN")
print(f"Word: {word}, Sense: {sense}")  # Word: hello world, Sense: NOUN
```

--------------------------------

### Register Custom Key Generation Function

Source: https://github.com/explosion/sense2vec/blob/master/README.md

Use the registry.make_key.register decorator to define a custom function for generating keys. This function takes a word and sense, returning a string key.

```python
from sense2vec import registry

@registry.make_key.register("custom")
def custom_make_key(word, sense):
    return f"{word}###{sense}"
```

--------------------------------

### Train custom sense2vec vectors: Export to sense2vec format

Source: https://context7.com/explosion/sense2vec/llms.txt

Step 5 of training custom vectors: Export the trained vectors (e.g., from GloVe) into the sense2vec format. Specify the input vector location and the output directory.

```bash
python scripts/05_export.py ./glove/ ./s2v_output/ --vectors-loc ./glove/vectors.txt
```

--------------------------------

### Load and use trained sense2vec vectors

Source: https://context7.com/explosion/sense2vec/llms.txt

Load custom trained sense2vec vectors from disk and find the most similar terms to a given input. The input term should include its sense (e.g., 'term|NOUN').

```python
from sense2vec import Sense2Vec

s2v = Sense2Vec().from_disk("./s2v_output")
print(f"Loaded {len(s2v)} vectors")
print(f"Available senses: {s2v.senses}")

# Test with domain-specific terms
results = s2v.most_similar("your_domain_term|NOUN", n=10)
for term, score in results:
    print(f"{term}: {score:.4f}")
```

--------------------------------

### Convert Phrases to Patterns with sense2vec.to-patterns

Source: https://github.com/explosion/sense2vec/blob/master/README.md

Convert a dataset of phrases into token-based match patterns for spaCy's EntityRuler or other NER recipes. Patterns are written to stdout by default if no output file is specified. Tokenization ensures multi-token terms are handled correctly.

```bash
prodigy sense2vec.to-patterns tech_phrases en_core_web_sm TECHNOLOGY
--output-file /path/to/patterns.jsonl
```

--------------------------------

### Deserialize Sense2Vec from Disk

Source: https://github.com/explosion/sense2vec/blob/master/README.md

Load a Sense2Vec object from a directory path. This is the counterpart to `to_disk` for restoring models. Fields can be excluded during loading.

```python
s2v.to_disk("/path/to/sense2vec")
new_s2v = Sense2Vec().from_disk("/path/to/sense2vec")
```

--------------------------------

### Serialize Sense2VecComponent to Bytes

Source: https://github.com/explosion/sense2vec/blob/master/README.md

Serialize the component to a bytestring. This is also called when the component is added to the pipeline and nlp.to_bytes is run.

```python
component_bytes = s2v.to_bytes()
```

--------------------------------

### Train custom sense2vec vectors: Train vectors with GloVe

Source: https://context7.com/explosion/sense2vec/llms.txt

Step 4a of training custom vectors: Train word vectors using the GloVe algorithm. Configure the number of threads and training iterations.

```bash
python scripts/04_glove_train_vectors.py ./glove/ /path/to/GloVe/build/ \
    --n-threads 8 --n-iter 15
```

--------------------------------

### Sense2VecComponent.from_bytes

Source: https://github.com/explosion/sense2vec/blob/master/README.md

Loads a Sense2VecComponent from a bytestring.

```APIDOC
## Sense2VecComponent.from_bytes

### Description
Load a component from a bytestring. Also called when you run `nlp.from_bytes`.

### Method
`from_bytes`

### Parameters
#### Path Parameters
None

#### Query Parameters
None

#### Request Body
- **bytes_data** (bytes) - Required - The data to load.

### Request Example
```python
loaded_component = Sense2VecComponent.from_bytes(bytes_data)
```

### Response
#### Success Response (200)
- **Sense2VecComponent** (Sense2VecComponent) - The loaded object.

#### Response Example
```json
{
  "example": "Sense2VecComponent object"
}
```
```

--------------------------------

### Add EntityRuler with patterns from disk

Source: https://context7.com/explosion/sense2vec/llms.txt

Load custom entity patterns from a JSONL file into a spaCy EntityRuler. Ensure the 'entity_ruler' pipe is added before 'ner'.

```python
import spacy
from spacy.pipeline import EntityRuler

lp = spacy.load("en_core_web_sm")
ruler = nlp.add_pipe("entity_ruler", before="ner")
ruler.from_disk("./patterns.jsonl")

doc = nlp("We use machine learning and neural networks.")
for ent in doc.ents:
    print(f"{ent.text}: {ent.label_}")
```

--------------------------------

### Deserialize Sense2VecComponent from Bytes

Source: https://github.com/explosion/sense2vec/blob/master/README.md

Load a component from a bytestring. This is also called when nlp.from_bytes is run.

```python
loaded_s2v = Sense2VecComponent.from_bytes(bytes_data)
```

--------------------------------

### Merge multi-part archives

Source: https://github.com/explosion/sense2vec/blob/master/README.md

Use the 'cat' command to merge split tar.gz files into a single archive. Ensure all parts are in the same directory.

```bash
cat s2v_reddit_2019_lg.tar.gz.* > s2v_reddit_2019_lg.tar.gz
```

--------------------------------

### Evaluate Sense2Vec Most Similar Entries

Source: https://github.com/explosion/sense2vec/blob/master/README.md

Use this command to evaluate a sense2vec model by checking its most similar entries for a given phrase. Specify the dataset, path to vectors, and optionally filter by senses.

```bash
prodigy sense2vec.eval-most-similar vectors_eval_sim /path/to/s2v_reddit_2015_md
--senses NOUN,ORG,PRODUCT
```

--------------------------------

### Train custom sense2vec vectors: Train vectors with FastText

Source: https://context7.com/explosion/sense2vec/llms.txt

Step 4b of training custom vectors: Alternatively, train word vectors using the FastText algorithm. Specify input and output directories, and the path to the FastText executable.

```bash
python scripts/04_fasttext_train_vectors.py ./preprocessed/ ./fasttext/ \
    /path/to/fasttext --n-threads 8
```

--------------------------------

### Training Custom Sense2Vec Vectors

Source: https://github.com/explosion/sense2vec/blob/master/README.md

Outlines the requirements for training your own Sense2Vec vectors.

```APIDOC
## Training Custom Sense2Vec Vectors

### Description
This section details the necessary components and tools required to train your own sense2vec vectors from scratch.

### Requirements

- **Large Text Corpus**: A very large source of raw text (ideally more than 1 billion words) is recommended due to the sparsity introduced by senses.
- **Pretrained spaCy Model**: A spaCy model that provides part-of-speech tags, dependencies, named entities, and populates `doc.noun_chunks`. If noun phrase extraction is not built-in for your language, you will need to implement a custom syntax iterator.
- **Vector Training Library**: GloVe or fastText installed and built. You should be able to clone their respective repositories and run `make`.
```

--------------------------------

### Train custom sense2vec vectors: Parsing raw text

Source: https://context7.com/explosion/sense2vec/llms.txt

Step 1 of training custom vectors: Parse raw text using a spaCy model. Specify input and output directories, and the spaCy model to use. Multiprocessing is supported.

```bash
python scripts/01_parse.py ./raw_text/ ./parsed/ en_core_web_lg \
    --n-process 4
```

--------------------------------

### Train custom sense2vec vectors: Precompute nearest neighbors cache

Source: https://context7.com/explosion/sense2vec/llms.txt

Step 6 (optional) of training custom vectors: Precompute a cache of nearest neighbors for faster lookups. Specify the output directory and the number of neighbors to cache.

```bash
python scripts/06_precompute_cache.py ./s2v_output/ --n-neighbors 100
```

--------------------------------

### sense2vec.to-patterns

Source: https://github.com/explosion/sense2vec/blob/master/README.md

Convert a dataset of phrases to token-based match patterns for spaCy's EntityRuler or other NER recipes.

```APIDOC
## sense2vec.to-patterns

### Description
Convert a dataset of phrases collected with `sense2vec.teach` to token-based match patterns that can be used with [spaCy's `EntityRuler`](https://spacy.io/usage/rule-based-matching#entityruler) or recipes like `ner.match`. If no output file is specified, the patterns are written to stdout. The examples are tokenized so that multi-token terms are represented correctly, e.g.: `{"label": "SHOE_BRAND", "pattern": [{ "LOWER": "new" }, { "LOWER": "balance" }]}`.

### Method
PRODIGY COMMAND

### Endpoint
sense2vec.to-patterns [dataset] [spacy_model] [label] [--output-file] [--case-sensitive] [--dry]

### Parameters
#### Positional Arguments
- **dataset** (positional) - Phrase dataset to convert.
- **spacy_model** (positional) - spaCy model for tokenization.
- **label** (positional) - Label to apply to all patterns.

#### Options
- **--output-file, -o** (option) - Optional output file. Defaults to stdout.
- **--case-sensitive, -CS** (flag) - Make patterns case-sensitive.
- **--dry, -D** (flag) - Perform a dry run and don't output anything.

### Request Example
```bash
prodigy sense2vec.to-patterns tech_phrases en_core_web_sm TECHNOLOGY --output-file /path/to/patterns.jsonl
```

### Response
#### Success Response (200)
- **Patterns** (JSONL) - Token-based match patterns written to stdout or specified file.

#### Response Example
```json
{
  "label": "TECHNOLOGY",
  "pattern": [
    { "LOWER": "natural" },
    { "LOWER": "language" },
    { "LOWER": "processing" }
  ]
}
```
```

--------------------------------

### Serialize Sense2Vec to Disk

Source: https://github.com/explosion/sense2vec/blob/master/README.md

Save a Sense2Vec object to a specified directory path. This allows for persistent storage of the model. Fields can be excluded from saving.

```python
s2v.to_disk("/path/to/sense2vec")
```

--------------------------------

### Sense2VecComponent.__call__

Source: https://github.com/explosion/sense2vec/blob/master/README.md

Processes a Doc object with the Sense2VecComponent, typically as part of the spaCy pipeline.

```APIDOC
## Sense2VecComponent.__call__

### Description
Process a `Doc` object with the component. Typically only called as part of the spaCy pipeline and not directly.

### Method
`__call__`

### Parameters
#### Path Parameters
None

#### Query Parameters
None

#### Request Body
- **doc** (Doc) - Required - The document to process.

### Request Example
```python
processed_doc = s2v(doc)
```

### Response
#### Success Response (200)
- **Doc** (Doc) - the processed document.

#### Response Example
```json
{
  "example": "Processed Doc object"
}
```
```

--------------------------------

### Register Custom Key Splitting Function

Source: https://github.com/explosion/sense2vec/blob/master/README.md

Use the registry.split_key.register decorator to define a custom function for splitting keys back into word and sense. This function takes a key string and returns a (word, sense) tuple.

```python
@registry.split_key.register("custom")
def custom_split_key(key):
    word, sense = key.split("###")
    return word, sense
```

--------------------------------

### Add Sense2Vec to spaCy Pipeline Config

Source: https://github.com/explosion/sense2vec/blob/master/README.md

Configure a spaCy pipeline to include a Sense2Vec component by specifying the data path in the [initialize.components.sense2vec] section of the training config.

```ini
[initialize.components]

[initialize.components.sense2vec]
data_path = "/path/to/s2v_reddit_2015_md"
```

--------------------------------

### Process Document with Sense2VecComponent

Source: https://github.com/explosion/sense2vec/blob/master/README.md

Process a Doc object with the component. This method is typically invoked as part of the spaCy pipeline.

```python
doc = s2v(doc)
```

--------------------------------

### sense2vec.eval

Source: https://github.com/explosion/sense2vec/blob/master/README.md

Evaluate a sense2vec model by asking about phrase triples.

```APIDOC
## sense2vec.eval

### Description
Evaluate a sense2vec model by asking about phrase triples.

### Method
PRODIGY COMMAND

### Endpoint
sense2vec.eval [dataset] [vectors_path]

### Parameters
#### Positional Arguments
- **dataset** (positional) - Dataset to save annotations to.
- **vectors_path** (positional) - Path to pretrained sense2vec vectors.

### Request Example
```bash
prodigy sense2vec.eval eval_phrases /path/to/s2v_reddit_2015_md
```

### Response
#### Success Response (200)
- **Evaluation Results** (dict) - Results of the model evaluation.

#### Response Example
(No specific response example provided, output is evaluation results)
```