### Example: Packaging and Installing a Pipeline

Source: https://github.com/explosion/spacy/blob/master/website/docs/api/cli.mdx

This example demonstrates how to package a pipeline and then install it using pip. After running the package command, navigate to the created directory and install the generated .tar.gz archive.

```bash
python -m spacy package /input /output
cd /output/en_pipeline-0.0.0
pip install dist/en_pipeline-0.0.0.tar.gz
```

--------------------------------

### Start spaCy Website Development Server

Source: https://github.com/explosion/spacy/blob/master/website/README.md

After installing dependencies, start the development server to preview changes locally. This command is used after 'npm install'.

```bash
npm run dev
```

--------------------------------

### Example Training Command and Output

Source: https://github.com/explosion/spacy/blob/master/website/docs/api/top-level.mdx

This shows the command to start training with a spaCy config file and an example of the console output, including loss and accuracy metrics. Note the cumulative loss behavior within an epoch.

```bash
$ python -m spacy train config.cfg

```

```text
ℹ Using CPU
ℹ Loading config and nlp from: config.cfg
ℹ Pipeline: ['tok2vec', 'tagger']
ℹ Start training
ℹ Training. Initial learn rate: 0.0
ℹ Saving results to training_log.jsonl

E     #        LOSS TOK2VEC   LOSS TAGGER   TAG_ACC   SCORE
---   ------   ------------   -----------   -------   ------
  0        0           0.00         86.20      0.22     0.00
  0      200           3.08      18968.78     34.00     0.34
  0      400          31.81      22539.06     33.64     0.34
  0      600          92.13      22794.91     43.80     0.44
  0      800         183.62      21541.39     56.05     0.56
  0     1000         352.49      25461.82     65.15     0.65
  0     1200         422.87      23708.82     71.84     0.72
  0     1400         601.92      24994.79     76.57     0.77
  0     1600         662.57      22268.02     80.20     0.80
  0     1800        1101.50      28413.77     82.56     0.83
  0     2000        1253.43      28736.36     85.00     0.85
  0     2200        1411.02      28237.53     87.42     0.87
  0     2400        1605.35      28439.95     88.70     0.89

```

--------------------------------

### spaCy.load under the hood (abstract example)

Source: https://github.com/explosion/spacy/blob/master/website/docs/usage/processing-pipelines.mdx

This abstract example demonstrates the internal steps spaCy takes when loading a pipeline, including getting the language class, initializing it, adding components, and loading data.

```python
lang = "en"
pipeline = ["tok2vec", "tagger", "parser", "ner", "attribute_ruler", "lemmatizer"]
data_path = "path/to/en_core_web_sm/en_core_web_sm-3.0.0"

cls = spacy.util.get_lang_class(lang)  # 1. Get Language class, e.g. English
lp = cls()                            # 2. Initialize it
for name in pipeline:
    nlp.add_pipe(name, config={...})   # 3. Add the component to the pipeline
lp.from_disk(data_path)               # 4. Load in the binary data
```

--------------------------------

### Print spaCy Installation Info

Source: https://github.com/explosion/spacy/blob/master/website/docs/api/cli.mdx

Use this command to display details about your spaCy installation, trained pipelines, and local setup. The `--markdown` flag formats the output for easy copy-pasting into GitHub issues.

```bash
$ python -m spacy info [--markdown] [--silent] [--exclude]
```

```bash
$ python -m spacy info en_core_web_lg --markdown
```

--------------------------------

### Configure Few-Shot Prompt Examples via Callback

Source: https://github.com/explosion/spacy/blob/master/website/docs/usage/large-language-models.mdx

Initialize the LLM component with a `get_examples` callback and set `n_prompt_examples` to automatically fetch examples for few-shot learning. A value of -1 uses all available examples.

```ini
[initialize.components.llm]
n_prompt_examples = 3
```

--------------------------------

### Get spaCy Installation Info

Source: https://github.com/explosion/spacy/blob/master/CONTRIBUTING.md

Use this command to print details about your spaCy installation and environment, formatted as Markdown for easy copy-pasting into GitHub issues.

```bash
python -m spacy info --markdown
```

--------------------------------

### Install Requirements and Assets

Source: https://github.com/explosion/spacy/blob/master/examples/README.md

Navigate to the project directory and install dependencies, then download necessary data assets.

```bash
cd ner_demo
python -m pip install -r requirements.txt
python -m spacy project assets
```

--------------------------------

### Download and load a spaCy pipeline

Source: https://github.com/explosion/spacy/blob/master/website/docs/usage/index.mdx

After installing spaCy, download a trained pipeline and load it for use. This example shows downloading the 'en_core_web_sm' model.

```bash
python -m spacy download en_core_web_sm
```

```python
import spacy
nlp = spacy.load("en_core_web_sm")
```

--------------------------------

### Initialize Method for Model Setup

Source: https://github.com/explosion/spacy/blob/master/website/docs/usage/layers-architectures.mdx

Initializes the model by either using explicitly provided labels or by inferring them from training examples. It then calls the model's initialize method with a sample of data.

```python
from itertools import islice

def initialize(
    self,
    get_examples: Callable[[], Iterable[Example]],
    *,
    nlp: Language = None,
    labels: Optional[List[str]] = None,
):
    if labels is not None:
        for label in labels:
            self.add_label(label)
    else:
        for example in get_examples():
            relations = example.reference._.rel
            for indices, label_dict in relations.items():
                for label in label_dict.keys():
                    self.add_label(label)
    subbatch = list(islice(get_examples(), 10))
    doc_sample = [eg.reference for eg in subbatch]
    label_sample = self._examples_to_truth(subbatch)
    self.model.initialize(X=doc_sample, Y=label_sample)
```

--------------------------------

### Example Class Initialization

Source: https://github.com/explosion/spacy/blob/master/website/docs/api/example.mdx

Constructs an Example object from predicted and reference documents. Alignment can be optionally provided.

```APIDOC
## Example.__init__

### Description
Construct an `Example` object from the `predicted` document and the `reference` document. If `alignment` is `None`, it will be initialized from the words in both documents.

### Method
`__init__`

### Parameters
#### Path Parameters
None

#### Query Parameters
None

#### Request Body
- **predicted** (Doc) - Required - The document containing (partial) predictions.
- **reference** (Doc) - Required - The document containing gold-standard annotations.
- **alignment** (Optional[Alignment]) - Optional - An object holding the alignment between the tokens of the `predicted` and `reference` documents.

### Request Example
```python
from spacy.tokens import Doc
from spacy.training import Example

# Assuming 'nlp' is a loaded spaCy model
nlp = spacy.load("en_core_web_sm") 

pred_words = ["Apply", "some", "sunscreen"]
pred_spaces = [True, True, False]
gold_words = ["Apply", "some", "sun", "screen"]
gold_spaces = [True, True, False, False]
gold_tags = ["VERB", "DET", "NOUN", "NOUN"]

predicted = Doc(nlp.vocab, words=pred_words, spaces=pred_spaces)
reference = Doc(nlp.vocab, words=gold_words, spaces=gold_spaces, tags=gold_tags)

example = Example(predicted, reference)
```

### Response
#### Success Response (200)
- **Example** (Example) - The newly constructed Example object.

#### Response Example
(No specific response example provided for constructor, but the object is created.)
```

--------------------------------

### Installation and Model Download

Source: https://context7.com/explosion/spacy/llms.txt

Instructions for installing spaCy and downloading trained pipeline models.

```APIDOC
## Installation

Install spaCy and download a trained pipeline model.

```bash
# Install spaCy
pip install -U pip setuptools wheel
pip install spacy

# Download English pipeline model
python -m spacy download en_core_web_sm

# Alternative: Download larger model with word vectors
python -m spacy download en_core_web_md
```
```

--------------------------------

### Tagger Configuration Example

Source: https://github.com/explosion/spacy/blob/master/website/docs/api/tagger.mdx

Configuration example for initializing the Tagger component, specifying label data path.

```ini
### config.cfg
[initialize.components.tagger]

[initialize.components.tagger.labels]
@readers = "spacy.read_labels.v1"
path = "corpus/labels/tagger.json
```

--------------------------------

### Install spacy-experimental

Source: https://github.com/explosion/spacy/blob/master/website/docs/api/coref.mdx

Install the spacy-experimental package to use the CoreferenceResolver component.

```bash
$ pip install -U spacy-experimental
```

--------------------------------

### DependencyParser Configuration Example

Source: https://github.com/explosion/spacy/blob/master/website/docs/api/dependencyparser.mdx

Configuration example for the parser component, specifying label data path.

```ini
### config.cfg
[initialize.components.parser]

[initialize.components.parser.labels]
@readers = "spacy.read_labels.v1"
path = "corpus/labels/parser.json
```

--------------------------------

### Configure FewShotReader for examples

Source: https://github.com/explosion/spacy/blob/master/website/docs/api/large-language-models.mdx

Configure the FewShotReader to load examples from a YAML, JSON, or JSONL file. Specify the path to the examples file. This is useful for few-shot learning scenarios.

```ini
[components.llm.task.examples]
@misc = "spacy.FewShotReader.v1"
path = "ner_examples.yml"
```

--------------------------------

### Example Raw Task Configuration

Source: https://github.com/explosion/spacy/blob/master/website/docs/api/large-language-models.mdx

This is an example configuration for the spacy.Raw.v1 task. It specifies the task and sets examples to null, indicating no few-shot examples are provided in this configuration.

```ini
[components.llm.task]
@llm_tasks = "spacy.Raw.v1"
examples = null

```

--------------------------------

### Install and Verify spacy-huggingface-hub

Source: https://github.com/explosion/spacy/blob/master/website/docs/usage/projects.mdx

Install the spacy-huggingface-hub package to add Hugging Face Hub integration to the spaCy CLI. Verify the installation by checking the help command.

```bash
$ pip install spacy-huggingface-hub
# Check that the CLI is registered
$ python -m spacy huggingface-hub --help
```

--------------------------------

### Get spaCy Installation and Pipeline Info

Source: https://github.com/explosion/spacy/blob/master/website/docs/api/top-level.mdx

Prints information about your spaCy installation, installed pipelines, and local setup. Can be used to display info for a specific model or in Markdown format.

```python
spacy.info()
spacy.info("en_core_web_sm")
markdown = spacy.info(markdown=True, silent=True)
```

--------------------------------

### spacy.info

Source: https://github.com/explosion/spacy/blob/master/website/docs/api/top-level.mdx

Provides information about the spaCy installation, installed pipelines, and local setup.

```APIDOC
## spacy.info

### Description
The same as the [`info` command](/api/cli#info). Pretty-print information about your installation, installed pipelines and local setup from within spaCy.

### Method
`spacy.info(model=None, *, markdown=False, silent=False)`

### Parameters
#### Path Parameters
None

#### Query Parameters
None

#### Request Body
None

### Request Example
```python
spacy.info()
spacy.info("en_core_web_sm")
markdown = spacy.info(markdown=True, silent=True)
```

### Response
#### Success Response (200)
- **str** - Information about the spaCy installation and models, or an empty string if `silent=True`.

#### Response Example
```json
{
  "info": "... spaCy installation details ..."
}
```
```

--------------------------------

### Initialize Example with Predicted and Reference Docs

Source: https://github.com/explosion/spacy/blob/master/website/docs/api/example.mdx

Construct an Example object from predicted and reference Doc objects. If alignment is not provided, it will be initialized automatically.

```python
from spacy.tokens import Doc
from spacy.training import Example
pred_words = ["Apply", "some", "sunscreen"]
pred_spaces = [True, True, False]
gold_words = ["Apply", "some", "sun", "screen"]
gold_spaces = [True, True, False, False]
gold_tags = ["VERB", "DET", "NOUN", "NOUN"]
predicted = Doc(nlp.vocab, words=pred_words, spaces=pred_spaces)
reference = Doc(nlp.vocab, words=gold_words, spaces=gold_spaces, tags=gold_tags)
example = Example(predicted, reference)
```

--------------------------------

### ConsoleLogger Configuration Example

Source: https://github.com/explosion/spacy/blob/master/website/docs/api/legacy.mdx

Example configuration for using the spacy.ConsoleLogger.v1 in a training setup.

```APIDOC
## Configuration Example

### Training Configuration

```ini
[training.logger]
@loggers = "spacy.ConsoleLogger.v1"
progress_bar = true
```
```

--------------------------------

### Install spaCy Lookup Data

Source: https://github.com/explosion/spacy/blob/master/website/docs/usage/v2-2.mdx

To use lemmatization for languages with only tokenizers, install the lookup data explicitly using pip. No additional setup is required after installation.

```python
nlp = Turkish()
doc = nlp("Bu bir cümledir.")
# 🚨 This now requires the lookups data to be installed explicitly
print([token.lemma_ for token in doc])
```

--------------------------------

### TrainablePipe.initialize

Source: https://github.com/explosion/spacy/blob/master/website/docs/api/pipe.mdx

Initialize the component for training. get_examples should be a function that returns an iterable of Example objects.

```APIDOC
## TrainablePipe.initialize

### Description
Initialize the component for training. `get_examples` should be a function that returns an iterable of [`Example`](/api/example) objects. The data examples are used to **initialize the model** of the component and can either be the full training data or a representative sample. Initialization includes validating the network, [inferring missing shapes](https://thinc.ai/docs/usage-models#validation) and setting up the label scheme based on the data. This method is typically called by [`Language.initialize`](/api/language#initialize).

<Infobox variant="warning" title="Changed in v3.0" id="begin_training">

This method was previously called `begin_training`.

</Infobox>

> #### Example
> ```python
pipe = nlp.add_pipe("your_custom_pipe")
pipe.initialize(lambda: [], pipeline=nlp.pipeline)
```

### Parameters
#### Request Body
- **get_examples** (Callable[[], Iterable[Example]]) - Required - Function that returns gold-standard annotations in the form of [`Example`](/api/example) objects.
- **nlp** (Optional[Language]) - Optional - The current `nlp` object. Defaults to `None`.
```

--------------------------------

### Get Installed Package Path

Source: https://github.com/explosion/spacy/blob/master/website/docs/api/top-level.mdx

Get the file path to an installed package. This function is mainly used to resolve the location of pipeline packages and currently imports the package to find its path. Use this to locate package directories.

```python
util.get_package_path("en_core_web_sm")
# /usr/lib/python3.6/site-packages/en_core_web_sm
```

--------------------------------

### Create Example with Part-of-Speech Tags

Source: https://github.com/explosion/spacy/blob/master/website/docs/usage/training.mdx

An alternative, more concise way to create a reference Doc with gold-standard annotations using Example.from_dict, specifying 'tags'.

```python
words = ["I", "like", "stuff"]
tags = ["NOUN", "VERB", "NOUN"]
predicted = Doc(nlp.vocab, words=words)
example = Example.from_dict(predicted, {"tags": tags})
```

--------------------------------

### Download Trained Pipeline Package

Source: https://github.com/explosion/spacy/blob/master/website/docs/usage/models.mdx

Use the spaCy CLI download command to install the best-matching version of a trained pipeline package compatible with your spaCy installation. For example, 'en_core_web_sm'.

```bash
# Download best-matching version of a package for your spaCy installation
$ python -m spacy download en_core_web_sm
```

--------------------------------

### Build spaCy with setup.py (Deprecated)

Source: https://github.com/explosion/spacy/blob/master/website/docs/usage/index.mdx

Use `python setup.py` commands for editable mode and parallel builds. This method is no longer recommended in favor of pip-based installations.

```bash
$ pip install -r requirements.txt
$ python setup.py build_ext --inplace -j 4
$ python setup.py develop
```

--------------------------------

### TextCatEnsemble.v1 Example Configuration

Source: https://github.com/explosion/spacy/blob/master/website/docs/api/legacy.mdx

Use this configuration to set up the TextCatEnsemble.v1 architecture. Ensure all parameters are correctly defined for your specific use case.

```ini
[model]
@architectures = "spacy.TextCatEnsemble.v1"
exclusive_classes = false
pretrained_vectors = null
width = 64
embed_size = 2000
conv_depth = 2
window_size = 1
ngram_size = 1
dropout = null
nO = null

```

--------------------------------

### Create Example with Named Entities

Source: https://github.com/explosion/spacy/blob/master/website/docs/usage/training.mdx

Demonstrates creating a Doc with gold-standard named entities using the BILUO tagging scheme via Example.from_dict.

```python
doc = Doc(nlp.vocab, words=["Facebook", "released", "React", "in", "2014"])
example = Example.from_dict(doc, {"entities": ["U-ORG", "O", "U-TECHNOLOGY", "O", "U-DATE"]})
```

--------------------------------

### Span Input Format

Source: https://github.com/explosion/spacy/blob/master/website/docs/usage/visualizers.mdx

Example of the JSON structure for visualizing custom spans, specifying start and end tokens.

```json
{
    "text": "Welcome to the Bank of China.",
    "spans": [
        {"start_token": 3, "end_token": 6, "label": "ORG"},
        {"start_token": 5, "end_token": 6, "label": "GPE"},
    ],
    "tokens": ["Welcome", "to", "the", "Bank", "of", "China", "."]
}
```

--------------------------------

### Migrate Simple Training Style to Example

Source: https://github.com/explosion/spacy/blob/master/website/docs/usage/training.mdx

Illustrates the migration from the older simple training style to using the Example.from_dict method in spaCy v3.0.

```diff
text = "Facebook released React in 2014"

```

--------------------------------

### Access Predicted Document

Source: https://github.com/explosion/spacy/blob/master/website/docs/api/example.mdx

Get the Doc object containing the pipeline's predictions from an Example. This is sometimes referred to as `example.x`.

```python
docs = [eg.predicted for eg in examples]
predictions, _ = model.begin_update(docs)
set_annotations(docs, predictions)
```

--------------------------------

### Debug Model - Inspect Dimensions, Parameters, and Gradients

Source: https://github.com/explosion/spacy/blob/master/website/docs/api/cli.mdx

This example shows how to inspect model initialization (Step 1) and updates after a training step (Step 2). It prints layer dimensions, parameters (like the 'W' matrix and 'b' bias), and their sample values, which is crucial for verifying correct propagation and training feedback.

```bash
python -m spacy debug model ./config.cfg tagger -l "5,15" -DIM -PAR -P0 -P1 -P2
```

```text
ℹ Using CPU
ℹ Fixing random seed: 0
ℹ Analysing model with ID 62

========================= STEP 0 - before training =========================
ℹ Layer 5: model ID 60: 'softmax'
ℹ  - dim nO: None
ℹ  - dim nI: 96
ℹ  - param W: None
ℹ  - param b: None
ℹ Layer 15: model ID 40: 'residual'
ℹ  - dim nO: None
ℹ  - dim nI: None

======================= STEP 1 - after initialization =======================
ℹ Layer 5: model ID 60: 'softmax'
ℹ  - dim nO: 4
ℹ  - dim nI: 96
ℹ  - param W: (4, 96) - sample: [0. 0. 0. 0. 0.]
ℹ  - param b: (4,) - sample: [0. 0. 0. 0.]
ℹ Layer 15: model ID 40: 'residual'
ℹ  - dim nO: 96
ℹ  - dim nI: None

========================== STEP 2 - after training ==========================
ℹ Layer 5: model ID 60: 'softmax'
ℹ  - dim nO: 4
ℹ  - dim nI: 96
ℹ  - param W: (4, 96) - sample: [ 0.00283958 -0.00294119  0.00268396 -0.00296219
-0.00297141]
ℹ  - param b: (4,) - sample: [0.00300002 0.00300002 0.00300002 0.00300002]
ℹ Layer 15: model ID 40: 'residual'
ℹ  - dim nO: 96
ℹ  - dim nI: None
```

--------------------------------

### EntityLinker v2 Architecture Configuration

Source: https://github.com/explosion/spacy/blob/master/website/docs/api/architectures.mdx

Example configuration for the spacy.EntityLinker.v2 architecture. This setup includes a tok2vec layer defined by spacy.HashEmbedCNN.v2.

```ini
[model]
@architectures = "spacy.EntityLinker.v2"
nO = null

[model.tok2vec]
@architectures = "spacy.HashEmbedCNN.v2"
pretrained_vectors = null
width = 96
depth = 2
embed_size = 2000
window_size = 1
maxout_pieces = 3
subword_features = true

```

--------------------------------

### Initialize StringStore

Source: https://github.com/explosion/spacy/blob/master/website/docs/api/stringstore.mdx

Create a new StringStore instance and initialize it with a sequence of strings.

```python
from spacy.strings import StringStore
stringstore = StringStore(["apple", "orange"])
```

--------------------------------

### Accessing Transformer Outputs

Source: https://github.com/explosion/spacy/blob/master/website/docs/api/transformer.mdx

Get the last hidden layer output for a specific token. Requires the spacy-transformers library to be installed and configured.

```python
# Get the last hidden layer output for "is" (token index 1)
doc = nlp("This is a text.")
indices = doc._.trf_data.align[1].data.flatten()
last_hidden_state = doc._.trf_data.model_output.last_hidden_state
dim = last_hidden_state.shape[-1]
tensors = last_hidden_state.reshape(-1, dim)[indices]
```

--------------------------------

### Get Loss with EditTreeLemmatizer

Source: https://github.com/explosion/spacy/blob/master/website/docs/api/edittreelemmatizer.mdx

Calculate the loss and gradient for a batch of documents and their predicted scores. Requires the batch of examples and the model's scores.

```python
lemmatizer = nlp.add_pipe("trainable_lemmatizer", name="lemmatizer")
scores = lemmatizer.model.begin_update([eg.predicted for eg in examples])
loss, d_loss = lemmatizer.get_loss(examples, scores)
```

--------------------------------

### Get Token Alignment from Example

Source: https://github.com/explosion/spacy/blob/master/website/docs/api/example.mdx

Access the alignment object to map tokens between predicted and reference documents. This is useful for comparing token-level correspondences.

```python
tokens_x = ["Apply", "some", "sunscreen"]
x = Doc(vocab, words=tokens_x)
tokens_y = ["Apply", "some", "sun", "screen"]
example = Example.from_dict(x, {"words": tokens_y})
alignment = example.alignment
assert list(alignment.y2x.data) == [[0], [1], [2], [2]]
```

--------------------------------

### Initialize and Save Config File

Source: https://github.com/explosion/spacy/blob/master/website/docs/api/cli.mdx

Use this command to create a training-ready config.cfg file with recommended settings for your use case. It auto-fills default values and can be customized later.

```bash
python -m spacy init config config.cfg --lang en --pipeline ner,textcat --optimize accuracy
```

--------------------------------

### Configure spaCy Sentiment Task

Source: https://github.com/explosion/spacy/blob/master/website/docs/api/large-language-models.mdx

Example configuration for the spacy.Sentiment.v1 task. This setup is used to define the LLM task component within a spaCy pipeline.

```ini
[components.llm.task]
@llm_tasks = "spacy.Sentiment.v1"
examples = null

```

--------------------------------

### Initialize a blank English model with lookups

Source: https://github.com/explosion/spacy/blob/master/website/docs/usage/v2-2.mdx

When starting with a blank model and requiring lookup data for lemmatization, explicitly install spaCy with lookups and initialize the model.

```python
import spacy
nlp = spacy.blank("en")
```

--------------------------------

### Install and Use spacy-huggingface-hub CLI

Source: https://github.com/explosion/spacy/blob/master/website/docs/usage/v3-1.mdx

Install the package, log in to Hugging Face CLI, package a spaCy model, and push it to the Hub. Ensure you are in the output directory containing the wheel file before pushing.

```bash
pip install spacy-huggingface-hub
huggingface-cli login
python -m spacy package ./en_ner_fashion ./output --build wheel
cd ./output/en_ner_fashion-0.0.0/dist
python -m spacy huggingface-hub push en_ner_fashion-0.0.0-py3-none-any.whl
```

--------------------------------

### Example: Push a specific spaCy pipeline wheel file

Source: https://github.com/explosion/spacy/blob/master/website/docs/api/cli.mdx

An example demonstrating how to push a specific `.whl` file for a spaCy pipeline to the Hugging Face Hub.

```bash
python -m spacy huggingface-hub push en_ner_fashion-0.0.0-py3-none-any.whl
```

--------------------------------

### Get Morphologizer Loss and Gradient

Source: https://github.com/explosion/spacy/blob/master/website/docs/api/morphologizer.mdx

Calculate the loss and gradient for a batch of documents based on predicted scores. Requires the batch of examples and the model's predictions.

```python
morphologizer = nlp.add_pipe("morphologizer")
scores = morphologizer.predict([eg.predicted for eg in examples])
loss, d_loss = morphologizer.get_loss(examples, scores)
```

--------------------------------

### Migrate begin_training to initialize

Source: https://github.com/explosion/spacy/blob/master/website/docs/usage/v3.mdx

The begin_training methods have been renamed to initialize in spaCy v3. The initialize method now accepts a function returning Example objects for model setup.

```diff
- nlp.begin_training()
+ nlp.initialize(lambda: examples)

```

--------------------------------

### Install spaCy and Download Models

Source: https://context7.com/explosion/spacy/llms.txt

Install spaCy and download a trained pipeline model. Use `en_core_web_sm` for a small English model or `en_core_web_md` for a larger one with word vectors.

```bash
pip install -U pip setuptools wheel
pip install spacy
```

```bash
python -m spacy download en_core_web_sm
```

```bash
python -m spacy download en_core_web_md
```

--------------------------------

### Initialize Training Configuration

Source: https://github.com/explosion/spacy/blob/master/website/docs/usage/v3.mdx

Generate a starter training configuration file using the spacy init config command, specifying language and pipeline components.

```bash
$ python -m spacy init config ./config.cfg --lang en --pipeline tagger,parser
```

--------------------------------

### TextCat v3 Configuration Example

Source: https://github.com/explosion/spacy/blob/master/website/docs/api/large-language-models.mdx

Use this configuration to define labels and their definitions for the TextCat v3 component. This setup is useful for providing context to the LLM about the nature of each label.

```ini
[components.llm.task]
@llm_tasks = "spacy.TextCat.v3"
labels = ["COMPLIMENT", "INSULT"]

[components.llm.task.label_definitions]
"COMPLIMENT" = "a polite expression of praise or admiration.",
"INSULT" = "a disrespectful or scornfully abusive remark or act."
examples = null

```

--------------------------------

### Access Transformer Output for a Token

Source: https://github.com/explosion/spacy/blob/master/website/docs/api/curatedtransformer.mdx

Retrieve the hidden state tensor for a specific token from the `Doc._.trf_data` attribute. This example shows how to get the output for the token 'is' (index 1).

```python
# Get the last hidden layer output for "is" (token index 1)
doc = nlp("This is a text.")
tensors = doc._.trf_data.last_hidden_layer_state[1]
```

--------------------------------

### Initialize TextCategorizer with Default and Custom Models

Source: https://github.com/explosion/spacy/blob/master/website/docs/api/textcategorizer.mdx

Demonstrates constructing a TextCategorizer using nlp.add_pipe with default settings and a custom configuration. Also shows direct instantiation from the class.

```python
# Construction via add_pipe with default model
# Use 'textcat_multilabel' for multi-label classification
textcat = nlp.add_pipe("textcat")

# Construction via add_pipe with custom model
config = {"model": {"@architectures": "my_textcat"}}
parser = nlp.add_pipe("textcat", config=config)

# Construction from class
# Use 'MultiLabel_TextCategorizer' for multi-label classification
from spacy.pipeline import TextCategorizer
textcat = TextCategorizer(nlp.vocab, model, threshold=0.5)
```

--------------------------------

### DependencyParser Initialization

Source: https://github.com/explosion/spacy/blob/master/website/docs/api/dependencyparser.mdx

Demonstrates how to initialize the DependencyParser, either through nlp.add_pipe with default or custom configurations, or directly from the class.

```APIDOC
## DependencyParser.__init__ {id="init",tag="method"}

### Description
Create a new pipeline instance. In your application, you would normally use a shortcut for this and instantiate the component using its string name and [`nlp.add_pipe`](/api/language#add_pipe).

### Parameters
#### Path Parameters
None

#### Query Parameters
None

#### Request Body
None

### Parameters
- **vocab** (object) - The shared vocabulary.
- **model** (object) - The [`Model`](https://thinc.ai/docs/api-model) powering the pipeline component.
- **name** (string) - String name of the component instance. Used to add entries to the `losses` during training.
- **moves** (object) - A list of transition names. Inferred from the data if not provided.
- **update_with_oracle_cut_size** (int) - During training, cut long sequences into shorter segments by creating intermediate states based on the gold-standard history. Defaults to `100`.
- **learn_tokens** (bool) - Whether to learn to merge subtokens that are split relative to the gold standard. Experimental. Defaults to `False`.
- **min_action_freq** (int) - The minimum frequency of labelled actions to retain. Rarer labelled actions have their label backed-off to "dep".
- **scorer** (object) - The scoring method. Defaults to [`Scorer.score_deps`](/api/scorer#score_deps) for the attribute "dep" ignoring the labels `p` and `punct` and [`Scorer.score_spans`](/api/scorer/#score_spans) for the attribute "sents".

### Request Example
```python
# Construction via add_pipe with default model
parser = nlp.add_pipe("parser")

# Construction via add_pipe with custom model
config = {"model": {"@architectures": "my_parser"}}
parser = nlp.add_pipe("parser", config=config)

# Construction from class
from spacy.pipeline import DependencyParser
parser = DependencyParser(nlp.vocab, model)
```

### Response
#### Success Response (200)
None

#### Response Example
None
```

--------------------------------

### Python Code Example for Project Integration

Source: https://github.com/explosion/spacy/blob/master/website/UNIVERSE.md

This Python code demonstrates how to load a spaCy model and add a custom pipeline component using a package. Ensure the package is installed and the model is available.

```python
import spacy
import package_name

lp = spacy.load('en')
nlp.add_pipe(package_name)
```

--------------------------------

### Load and Use Custom Component

Source: https://github.com/explosion/spacy/blob/master/website/docs/usage/saving-loading.mdx

This example demonstrates loading a spaCy pipeline with the custom 'snek' component after the package has been installed. It shows that the component can be added using `nlp.add_pipe('snek')` without explicit import, and then used to process a document.

```python
from spacy.lang.en import English
nlp = English()
nlp.add_pipe("snek")  # this now works! 🐍🎉
doc = nlp("I am snek")

```

--------------------------------

### Install and Initialize DVC

Source: https://github.com/explosion/spacy/blob/master/website/docs/usage/projects.mdx

Install DVC and initialize your spaCy project as a Git and DVC repository. This sets up DVC for tracking data assets.

```bash
pip install dvc   # Install DVC
git init          # Initialize a Git repo
dvc init          # Initialize a DVC project
```

--------------------------------

### Analyze Pipeline Components in Python

Source: https://github.com/explosion/spacy/blob/master/website/docs/usage/processing-pipelines.mdx

Use nlp.analyze_pipes to inspect pipeline components. This example demonstrates adding a 'tagger' and an 'entity_linker' to a blank English pipeline and then analyzing the component configurations. Note that the 'entity_linker' has unmet requirements in this initial setup.

```python
import spacy

lp = spacy.blank("en")
lp.add_pipe("tagger")
# This is a problem because it needs entities and sentence boundaries
lp.add_pipe("entity_linker")
analysis = nlp.analyze_pipes(pretty=True)
```

--------------------------------

### Debug Config Output (Valid Config and Options)

Source: https://github.com/explosion/spacy/blob/master/website/docs/api/cli.mdx

Example output for a valid configuration file, showing registered functions and variables. Use --show-functions and --show-variables flags to display this information.

```bash
$ python -m spacy debug config ./config.cfg --show-functions --show-variables
```

```bash
============================= Config validation =============================
✔ Config is valid

=============================== Variables (6) ===============================

Variable                                   Value
-----------------------------------------  ----------------------------------
${components.tok2vec.model.encode.width}   96
${paths.dev}                               'hello'
${paths.init_tok2vec}                      None
${paths.raw}                               None
${paths.train}                             ''
${system.seed}                             0


========================= Registered functions (17) =========================
ℹ [nlp.tokenizer]
Registry   @tokenizers
Name       spacy.Tokenizer.v1
Module     spacy.language
File       /path/to/spacy/language.py (line 64)
ℹ [components.ner.model]
Registry   @architectures
Name       spacy.TransitionBasedParser.v1
Module     spacy.ml.models.parser
File       /path/to/spacy/ml/models/parser.py (line 11)
ℹ [components.ner.model.tok2vec]
Registry   @architectures
Name       spacy.Tok2VecListener.v1
Module     spacy.ml.models.tok2vec
File       /path/to/spacy/ml/models/tok2vec.py (line 16)
ℹ [components.parser.model]
Registry   @architectures
Name       spacy.TransitionBasedParser.v1
Module     spacy.ml.models.parser
File       /path/to/spacy/ml/models/parser.py (line 11)
ℹ [components.parser.model.tok2vec]
Registry   @architectures
Name       spacy.Tok2VecListener.v1
Module     spacy.ml.models.tok2vec
File       /path/to/spacy/ml/models/tok2vec.py (line 16)
ℹ [components.tagger.model]
Registry   @architectures
Name       spacy.Tagger.v1
Module     spacy.ml.models.tagger
File       /path/to/spacy/ml/models/tagger.py (line 9)
ℹ [components.tagger.model.tok2vec]
Registry   @architectures
Name       spacy.Tok2VecListener.v1
Module     spacy.ml.models.tok2vec
File       /path/to/spacy/ml/models/tok2vec.py (line 16)
ℹ [components.tok2vec.model]
Registry   @architectures
Name       spacy.Tok2Vec.v1
Module     spacy.ml.models.tok2vec
File       /path/to/spacy/ml/models/tok2vec.py (line 72)
ℹ [components.tok2vec.model.embed]
Registry   @architectures
Name       spacy.MultiHashEmbed.v1
Module     spacy.ml.models.tok2vec
File       /path/to/spacy/ml/models/tok2vec.py (line 93)
ℹ [components.tok2vec.model.encode]
Registry   @architectures
Name       spacy.MaxoutWindowEncoder.v1
Module     spacy.ml.models.tok2vec
File       /path/to/spacy/ml/models/tok2vec.py (line 207)
ℹ [corpora.dev]
Registry   @readers
Name       spacy.Corpus.v1
Module     spacy.training.corpus
File       /path/to/spacy/training/corpus.py (line 18)
ℹ [corpora.train]
Registry   @readers
Name       spacy.Corpus.v1
Module     spacy.training.corpus
File       /path/to/spacy/training/corpus.py (line 18)
ℹ [training.logger]
Registry   @loggers
Name       spacy.ConsoleLogger.v1
Module     spacy.training.loggers
File       /path/to/spacy/training/loggers.py (line 8)
ℹ [training.batcher]
Registry   @batchers

```

--------------------------------

### Extract Named Entities from Text

Source: https://github.com/explosion/spacy/blob/master/website/docs/usage/101/_named-entities.mdx

Use this Python code to load a spaCy model, process a document, and iterate through detected named entities, printing their text, start and end character positions, and labels. Ensure you have the 'en_core_web_sm' model installed.

```python
import spacy

lp = spacy.load("en_core_web_sm")
doc = nlp("Apple is looking at buying U.K. startup for $1 billion")

for ent in doc.ents:
    print(ent.text, ent.start_char, ent.end_char, ent.label_)
```

--------------------------------

### Example project.yml Structure

Source: https://github.com/explosion/spacy/blob/master/website/docs/usage/projects.mdx

This is an example of a project.yml file, which defines project assets and commands. It is similar to CI pipeline configuration files.

```yaml
%%GITHUB_PROJECTS/pipelines/tagger_parser_ud/project.yml
```

--------------------------------

### Load Pipeline Meta Data

Source: https://github.com/explosion/spacy/blob/master/website/docs/api/top-level.mdx

Get a pipeline's meta.json from a file path and validate its contents. The meta data typically includes details about author, licensing, data sources, and version. Use this to retrieve metadata about installed pipelines.

```python
meta = util.load_meta("/path/to/meta.json")
```

--------------------------------

### Install spacy-huggingface-hub and Login to Hugging Face

Source: https://github.com/explosion/spacy/blob/master/website/docs/api/cli.mdx

Install the necessary package and log in to your Hugging Face account to enable uploading pipelines.

```bash
pip install spacy-huggingface-hub
huggingface-cli login
```

--------------------------------

### NER Entity Data Structure Example

Source: https://github.com/explosion/spacy/blob/master/website/docs/api/top-level.mdx

This JSON structure represents a named entity within a document. It includes the text, a list of entities with their start and end character indices, and labels. Optional fields like 'title' and 'settings' can be provided for visualization.

```json
{
  "text": "Welcome to the Bank of China.",
  "spans": [
    { "start_token": 3, "end_token": 6, "label": "ORG" },
    { "start_token": 5, "end_token": 6, "label": "GPE" }
  ],
  "tokens": ["Welcome", "to", "the", "Bank", "of", "China", "."]
}
```

--------------------------------

### Create Example from Dictionary

Source: https://github.com/explosion/spacy/blob/master/website/docs/api/data-formats.mdx

Use `Example.from_dict` to create a training instance from a reference Doc and a dictionary of gold-standard annotations. This method is part of the internal training API.

```python
example = Example.from_dict(doc, gold_dict)
```

--------------------------------

### Basic Span Visualization

Source: https://github.com/explosion/spacy/blob/master/website/docs/usage/visualizers.mdx

This example demonstrates how to use the `displacy.serve` function with the 'span' style to visualize overlapping spans in a text. It requires importing `spacy` and `displacy`, creating a blank English model, processing text, and defining spans with their start, end, and label.

```python
import spacy
from spacy import displacy
from spacy.tokens import Span

text = "Welcome to the Bank of China."

nlp = spacy.blank("en")
doc = nlp(text)

doc.spans["sc"] = [
    Span(doc, 3, 6, "ORG"),
    Span(doc, 5, 6, "GPE"),
]

displacy.serve(doc, style="span")
```

--------------------------------

### Create Configuration File

Source: https://github.com/explosion/spacy/blob/master/examples/README.md

Initializes a spaCy configuration file for NER pipeline training.

```python
Running command: /home/user/venv/bin/python -m spacy init config --lang en --pipeline ner configs/config.cfg --force
ℹ Generated config template specific for your use case
- Language: en
- Pipeline: ner
- Optimize for: efficiency
- Hardware: CPU
- Transformer: None
✔ Auto-filled config with all values
✔ Saved config
configs/config.cfg
You can now add your data and train your pipeline:
python -m spacy train config.cfg --paths.train ./train.spacy --paths.dev ./dev.spacy
```

--------------------------------

### Score TextCategorizer Examples

Source: https://github.com/explosion/spacy/blob/master/website/docs/api/textcategorizer.mdx

Use the score method to evaluate a batch of examples with the TextCategorizer. The input 'examples' should be an iterable of Example objects.

```python
scores = textcat.score(examples)
```

--------------------------------

### Project Training Configuration

Source: https://github.com/explosion/spacy/blob/master/website/docs/usage/projects.mdx

Example `project.yml` configuration for a 'train' command, specifying the script, dependencies, and expected outputs.

```yaml
- name: train
  help: 'Train a spaCy pipeline using the specified corpus and config'
  script:
    - 'spacy train ./config.cfg --output training/'
  deps:
    - 'corpus/train'
    - 'corpus/dev'
    - 'config.cfg'
  outputs:
    - 'training/model-best'
```

--------------------------------

### Fill Configuration Defaults

Source: https://github.com/explosion/spacy/blob/master/website/docs/usage/training.mdx

Use `init fill-config` to populate a base configuration file with default settings. This ensures a complete and reproducible configuration for training.

```bash
python -m spacy init fill-config base_config.cfg config.cfg
```

--------------------------------

### Install spaCy with CUDA GPU Support

Source: https://github.com/explosion/spacy/blob/master/website/docs/usage/index.mdx

Install spaCy with GPU support by specifying the CUDA version in the pip install command. This installs CuPy for GPU array compatibility.

```bash
pip install -U %%SPACY_PKG_NAME[cuda113]%%SPACY_PKG_FLAGS
```

--------------------------------

### Few-Shot Learning Example for Summarization

Source: https://github.com/explosion/spacy/blob/master/website/docs/api/large-language-models.mdx

An example of how to structure few-shot learning examples for the summarization task in YAML format.

```APIDOC
## Few-Shot Summarization Example

### Description
This is an example of a few-shot learning entry for the summarization task. It includes the input text and its corresponding desired summary.

### Method
N/A (Data format for few-shot examples)

### Endpoint
N/A

### Request Example
```yaml
- text: >
    The United Nations, referred to informally as the UN, is an
    intergovernmental organization whose stated purposes are  to maintain
    international peace and security, develop friendly relations among nations,
    achieve international cooperation, and serve as a centre for harmonizing the
    actions of nations. It is the world's largest international organization.
    The UN is headquartered on international territory in New York City, and the
    organization has other offices in Geneva, Nairobi, Vienna, and The Hague,
    where the International Court of Justice is headquartered.\n\n    The UN was
    established after World War II with the aim of preventing future world wars,
    and succeeded the League of  Nations, which was characterized as
    ineffective.
  summary:
    'The UN is an international organization that promotes global peace,
    cooperation, and harmony. Established after WWII, its purpose is to prevent
    future world wars.'
```
```

--------------------------------

### Install spaCy with conda

Source: https://github.com/explosion/spacy/blob/master/README.md

Install spaCy from the conda-forge channel using the conda package manager. This is an alternative to pip installation.

```bash
conda install -c conda-forge spacy
```

--------------------------------

### Install spaCy in Editable Mode

Source: https://github.com/explosion/spacy/blob/master/website/docs/usage/index.mdx

Install spaCy in editable mode for development. Changes to Python files are reflected immediately, but Cython file edits require rerunning the install command. Ensure previous installs are removed.

```bash
$ pip install -r requirements.txt
$ pip install --no-build-isolation --editable .
```

--------------------------------

### Serve Dependency Visualization with Options

Source: https://github.com/explosion/spacy/blob/master/website/docs/api/top-level.mdx

Use this to serve a dependency visualization with custom options. The `options` dictionary can include keys like `compact`, `color`, and others detailed in the table.

```python
options = {"compact": True, "color": "blue"}
displacy.serve(doc, style="dep", options=options)
```

--------------------------------

### Few-Shot Lemma Examples

Source: https://github.com/explosion/spacy/blob/master/website/docs/api/large-language-models.mdx

YAML format for providing few-shot examples for the spacy.Lemma.v1 task. Each example includes text and its corresponding lemmas.

```yaml
- text: I'm buying ice cream.
  lemmas:
    - 'I': 'I'
    - "'m": 'be'
    - 'buying': 'buy'
    - 'ice': 'ice'
    - 'cream': 'cream'
    - '.': '.'

- text: I've watered the plants.
  lemmas:
    - 'I': 'I'
    - "'ve": 'have'
    - 'watered': 'water'
    - 'the': 'the'
    - 'plants': 'plant'
    - '.': '.'

```

--------------------------------

### Install spaCy Model from Local File

Source: https://github.com/explosion/spacy/blob/master/website/docs/usage/models.mdx

Install a spaCy model from a local wheel file or tar.gz archive. Ensure the path to the file is correct.

```bash
$ pip install /Users/you/en_core_web_sm-3.0.0-py3-none-any.whl
```

```bash
$ pip install /Users/you/en_core_web_sm-3.0.0.tar.gz
```