### Example: Packaging and Installing a Pipeline Source: https://github.com/explosion/spacy/blob/master/website/docs/api/cli.mdx This example demonstrates how to package a pipeline and then install it using pip. After running the package command, navigate to the created directory and install the generated .tar.gz archive. ```bash python -m spacy package /input /output cd /output/en_pipeline-0.0.0 pip install dist/en_pipeline-0.0.0.tar.gz ``` -------------------------------- ### Start spaCy Website Development Server Source: https://github.com/explosion/spacy/blob/master/website/README.md After installing dependencies, start the development server to preview changes locally. This command is used after 'npm install'. ```bash npm run dev ``` -------------------------------- ### Example Training Command and Output Source: https://github.com/explosion/spacy/blob/master/website/docs/api/top-level.mdx This shows the command to start training with a spaCy config file and an example of the console output, including loss and accuracy metrics. Note the cumulative loss behavior within an epoch. ```bash $ python -m spacy train config.cfg ``` ```text ℹ Using CPU ℹ Loading config and nlp from: config.cfg ℹ Pipeline: ['tok2vec', 'tagger'] ℹ Start training ℹ Training. Initial learn rate: 0.0 ℹ Saving results to training_log.jsonl E # LOSS TOK2VEC LOSS TAGGER TAG_ACC SCORE --- ------ ------------ ----------- ------- ------ 0 0 0.00 86.20 0.22 0.00 0 200 3.08 18968.78 34.00 0.34 0 400 31.81 22539.06 33.64 0.34 0 600 92.13 22794.91 43.80 0.44 0 800 183.62 21541.39 56.05 0.56 0 1000 352.49 25461.82 65.15 0.65 0 1200 422.87 23708.82 71.84 0.72 0 1400 601.92 24994.79 76.57 0.77 0 1600 662.57 22268.02 80.20 0.80 0 1800 1101.50 28413.77 82.56 0.83 0 2000 1253.43 28736.36 85.00 0.85 0 2200 1411.02 28237.53 87.42 0.87 0 2400 1605.35 28439.95 88.70 0.89 ``` -------------------------------- ### spaCy.load under the hood (abstract example) Source: https://github.com/explosion/spacy/blob/master/website/docs/usage/processing-pipelines.mdx This abstract example demonstrates the internal steps spaCy takes when loading a pipeline, including getting the language class, initializing it, adding components, and loading data. ```python lang = "en" pipeline = ["tok2vec", "tagger", "parser", "ner", "attribute_ruler", "lemmatizer"] data_path = "path/to/en_core_web_sm/en_core_web_sm-3.0.0" cls = spacy.util.get_lang_class(lang) # 1. Get Language class, e.g. English lp = cls() # 2. Initialize it for name in pipeline: nlp.add_pipe(name, config={...}) # 3. Add the component to the pipeline lp.from_disk(data_path) # 4. Load in the binary data ``` -------------------------------- ### Print spaCy Installation Info Source: https://github.com/explosion/spacy/blob/master/website/docs/api/cli.mdx Use this command to display details about your spaCy installation, trained pipelines, and local setup. The `--markdown` flag formats the output for easy copy-pasting into GitHub issues. ```bash $ python -m spacy info [--markdown] [--silent] [--exclude] ``` ```bash $ python -m spacy info en_core_web_lg --markdown ``` -------------------------------- ### Configure Few-Shot Prompt Examples via Callback Source: https://github.com/explosion/spacy/blob/master/website/docs/usage/large-language-models.mdx Initialize the LLM component with a `get_examples` callback and set `n_prompt_examples` to automatically fetch examples for few-shot learning. A value of -1 uses all available examples. ```ini [initialize.components.llm] n_prompt_examples = 3 ``` -------------------------------- ### Get spaCy Installation Info Source: https://github.com/explosion/spacy/blob/master/CONTRIBUTING.md Use this command to print details about your spaCy installation and environment, formatted as Markdown for easy copy-pasting into GitHub issues. ```bash python -m spacy info --markdown ``` -------------------------------- ### Install Requirements and Assets Source: https://github.com/explosion/spacy/blob/master/examples/README.md Navigate to the project directory and install dependencies, then download necessary data assets. ```bash cd ner_demo python -m pip install -r requirements.txt python -m spacy project assets ``` -------------------------------- ### Download and load a spaCy pipeline Source: https://github.com/explosion/spacy/blob/master/website/docs/usage/index.mdx After installing spaCy, download a trained pipeline and load it for use. This example shows downloading the 'en_core_web_sm' model. ```bash python -m spacy download en_core_web_sm ``` ```python import spacy nlp = spacy.load("en_core_web_sm") ``` -------------------------------- ### Initialize Method for Model Setup Source: https://github.com/explosion/spacy/blob/master/website/docs/usage/layers-architectures.mdx Initializes the model by either using explicitly provided labels or by inferring them from training examples. It then calls the model's initialize method with a sample of data. ```python from itertools import islice def initialize( self, get_examples: Callable[[], Iterable[Example]], *, nlp: Language = None, labels: Optional[List[str]] = None, ): if labels is not None: for label in labels: self.add_label(label) else: for example in get_examples(): relations = example.reference._.rel for indices, label_dict in relations.items(): for label in label_dict.keys(): self.add_label(label) subbatch = list(islice(get_examples(), 10)) doc_sample = [eg.reference for eg in subbatch] label_sample = self._examples_to_truth(subbatch) self.model.initialize(X=doc_sample, Y=label_sample) ``` -------------------------------- ### Example Class Initialization Source: https://github.com/explosion/spacy/blob/master/website/docs/api/example.mdx Constructs an Example object from predicted and reference documents. Alignment can be optionally provided. ```APIDOC ## Example.__init__ ### Description Construct an `Example` object from the `predicted` document and the `reference` document. If `alignment` is `None`, it will be initialized from the words in both documents. ### Method `__init__` ### Parameters #### Path Parameters None #### Query Parameters None #### Request Body - **predicted** (Doc) - Required - The document containing (partial) predictions. - **reference** (Doc) - Required - The document containing gold-standard annotations. - **alignment** (Optional[Alignment]) - Optional - An object holding the alignment between the tokens of the `predicted` and `reference` documents. ### Request Example ```python from spacy.tokens import Doc from spacy.training import Example # Assuming 'nlp' is a loaded spaCy model nlp = spacy.load("en_core_web_sm") pred_words = ["Apply", "some", "sunscreen"] pred_spaces = [True, True, False] gold_words = ["Apply", "some", "sun", "screen"] gold_spaces = [True, True, False, False] gold_tags = ["VERB", "DET", "NOUN", "NOUN"] predicted = Doc(nlp.vocab, words=pred_words, spaces=pred_spaces) reference = Doc(nlp.vocab, words=gold_words, spaces=gold_spaces, tags=gold_tags) example = Example(predicted, reference) ``` ### Response #### Success Response (200) - **Example** (Example) - The newly constructed Example object. #### Response Example (No specific response example provided for constructor, but the object is created.) ``` -------------------------------- ### Installation and Model Download Source: https://context7.com/explosion/spacy/llms.txt Instructions for installing spaCy and downloading trained pipeline models. ```APIDOC ## Installation Install spaCy and download a trained pipeline model. ```bash # Install spaCy pip install -U pip setuptools wheel pip install spacy # Download English pipeline model python -m spacy download en_core_web_sm # Alternative: Download larger model with word vectors python -m spacy download en_core_web_md ``` ``` -------------------------------- ### Tagger Configuration Example Source: https://github.com/explosion/spacy/blob/master/website/docs/api/tagger.mdx Configuration example for initializing the Tagger component, specifying label data path. ```ini ### config.cfg [initialize.components.tagger] [initialize.components.tagger.labels] @readers = "spacy.read_labels.v1" path = "corpus/labels/tagger.json ``` -------------------------------- ### Install spacy-experimental Source: https://github.com/explosion/spacy/blob/master/website/docs/api/coref.mdx Install the spacy-experimental package to use the CoreferenceResolver component. ```bash $ pip install -U spacy-experimental ``` -------------------------------- ### DependencyParser Configuration Example Source: https://github.com/explosion/spacy/blob/master/website/docs/api/dependencyparser.mdx Configuration example for the parser component, specifying label data path. ```ini ### config.cfg [initialize.components.parser] [initialize.components.parser.labels] @readers = "spacy.read_labels.v1" path = "corpus/labels/parser.json ``` -------------------------------- ### Configure FewShotReader for examples Source: https://github.com/explosion/spacy/blob/master/website/docs/api/large-language-models.mdx Configure the FewShotReader to load examples from a YAML, JSON, or JSONL file. Specify the path to the examples file. This is useful for few-shot learning scenarios. ```ini [components.llm.task.examples] @misc = "spacy.FewShotReader.v1" path = "ner_examples.yml" ``` -------------------------------- ### Example Raw Task Configuration Source: https://github.com/explosion/spacy/blob/master/website/docs/api/large-language-models.mdx This is an example configuration for the spacy.Raw.v1 task. It specifies the task and sets examples to null, indicating no few-shot examples are provided in this configuration. ```ini [components.llm.task] @llm_tasks = "spacy.Raw.v1" examples = null ``` -------------------------------- ### Install and Verify spacy-huggingface-hub Source: https://github.com/explosion/spacy/blob/master/website/docs/usage/projects.mdx Install the spacy-huggingface-hub package to add Hugging Face Hub integration to the spaCy CLI. Verify the installation by checking the help command. ```bash $ pip install spacy-huggingface-hub # Check that the CLI is registered $ python -m spacy huggingface-hub --help ``` -------------------------------- ### Get spaCy Installation and Pipeline Info Source: https://github.com/explosion/spacy/blob/master/website/docs/api/top-level.mdx Prints information about your spaCy installation, installed pipelines, and local setup. Can be used to display info for a specific model or in Markdown format. ```python spacy.info() spacy.info("en_core_web_sm") markdown = spacy.info(markdown=True, silent=True) ``` -------------------------------- ### spacy.info Source: https://github.com/explosion/spacy/blob/master/website/docs/api/top-level.mdx Provides information about the spaCy installation, installed pipelines, and local setup. ```APIDOC ## spacy.info ### Description The same as the [`info` command](/api/cli#info). Pretty-print information about your installation, installed pipelines and local setup from within spaCy. ### Method `spacy.info(model=None, *, markdown=False, silent=False)` ### Parameters #### Path Parameters None #### Query Parameters None #### Request Body None ### Request Example ```python spacy.info() spacy.info("en_core_web_sm") markdown = spacy.info(markdown=True, silent=True) ``` ### Response #### Success Response (200) - **str** - Information about the spaCy installation and models, or an empty string if `silent=True`. #### Response Example ```json { "info": "... spaCy installation details ..." } ``` ``` -------------------------------- ### Initialize Example with Predicted and Reference Docs Source: https://github.com/explosion/spacy/blob/master/website/docs/api/example.mdx Construct an Example object from predicted and reference Doc objects. If alignment is not provided, it will be initialized automatically. ```python from spacy.tokens import Doc from spacy.training import Example pred_words = ["Apply", "some", "sunscreen"] pred_spaces = [True, True, False] gold_words = ["Apply", "some", "sun", "screen"] gold_spaces = [True, True, False, False] gold_tags = ["VERB", "DET", "NOUN", "NOUN"] predicted = Doc(nlp.vocab, words=pred_words, spaces=pred_spaces) reference = Doc(nlp.vocab, words=gold_words, spaces=gold_spaces, tags=gold_tags) example = Example(predicted, reference) ``` -------------------------------- ### ConsoleLogger Configuration Example Source: https://github.com/explosion/spacy/blob/master/website/docs/api/legacy.mdx Example configuration for using the spacy.ConsoleLogger.v1 in a training setup. ```APIDOC ## Configuration Example ### Training Configuration ```ini [training.logger] @loggers = "spacy.ConsoleLogger.v1" progress_bar = true ``` ``` -------------------------------- ### Install spaCy Lookup Data Source: https://github.com/explosion/spacy/blob/master/website/docs/usage/v2-2.mdx To use lemmatization for languages with only tokenizers, install the lookup data explicitly using pip. No additional setup is required after installation. ```python nlp = Turkish() doc = nlp("Bu bir cümledir.") # 🚨 This now requires the lookups data to be installed explicitly print([token.lemma_ for token in doc]) ``` -------------------------------- ### TrainablePipe.initialize Source: https://github.com/explosion/spacy/blob/master/website/docs/api/pipe.mdx Initialize the component for training. get_examples should be a function that returns an iterable of Example objects. ```APIDOC ## TrainablePipe.initialize ### Description Initialize the component for training. `get_examples` should be a function that returns an iterable of [`Example`](/api/example) objects. The data examples are used to **initialize the model** of the component and can either be the full training data or a representative sample. Initialization includes validating the network, [inferring missing shapes](https://thinc.ai/docs/usage-models#validation) and setting up the label scheme based on the data. This method is typically called by [`Language.initialize`](/api/language#initialize). This method was previously called `begin_training`. > #### Example > ```python pipe = nlp.add_pipe("your_custom_pipe") pipe.initialize(lambda: [], pipeline=nlp.pipeline) ``` ### Parameters #### Request Body - **get_examples** (Callable[[], Iterable[Example]]) - Required - Function that returns gold-standard annotations in the form of [`Example`](/api/example) objects. - **nlp** (Optional[Language]) - Optional - The current `nlp` object. Defaults to `None`. ``` -------------------------------- ### Get Installed Package Path Source: https://github.com/explosion/spacy/blob/master/website/docs/api/top-level.mdx Get the file path to an installed package. This function is mainly used to resolve the location of pipeline packages and currently imports the package to find its path. Use this to locate package directories. ```python util.get_package_path("en_core_web_sm") # /usr/lib/python3.6/site-packages/en_core_web_sm ``` -------------------------------- ### Create Example with Part-of-Speech Tags Source: https://github.com/explosion/spacy/blob/master/website/docs/usage/training.mdx An alternative, more concise way to create a reference Doc with gold-standard annotations using Example.from_dict, specifying 'tags'. ```python words = ["I", "like", "stuff"] tags = ["NOUN", "VERB", "NOUN"] predicted = Doc(nlp.vocab, words=words) example = Example.from_dict(predicted, {"tags": tags}) ``` -------------------------------- ### Download Trained Pipeline Package Source: https://github.com/explosion/spacy/blob/master/website/docs/usage/models.mdx Use the spaCy CLI download command to install the best-matching version of a trained pipeline package compatible with your spaCy installation. For example, 'en_core_web_sm'. ```bash # Download best-matching version of a package for your spaCy installation $ python -m spacy download en_core_web_sm ``` -------------------------------- ### Build spaCy with setup.py (Deprecated) Source: https://github.com/explosion/spacy/blob/master/website/docs/usage/index.mdx Use `python setup.py` commands for editable mode and parallel builds. This method is no longer recommended in favor of pip-based installations. ```bash $ pip install -r requirements.txt $ python setup.py build_ext --inplace -j 4 $ python setup.py develop ``` -------------------------------- ### TextCatEnsemble.v1 Example Configuration Source: https://github.com/explosion/spacy/blob/master/website/docs/api/legacy.mdx Use this configuration to set up the TextCatEnsemble.v1 architecture. Ensure all parameters are correctly defined for your specific use case. ```ini [model] @architectures = "spacy.TextCatEnsemble.v1" exclusive_classes = false pretrained_vectors = null width = 64 embed_size = 2000 conv_depth = 2 window_size = 1 ngram_size = 1 dropout = null nO = null ``` -------------------------------- ### Create Example with Named Entities Source: https://github.com/explosion/spacy/blob/master/website/docs/usage/training.mdx Demonstrates creating a Doc with gold-standard named entities using the BILUO tagging scheme via Example.from_dict. ```python doc = Doc(nlp.vocab, words=["Facebook", "released", "React", "in", "2014"]) example = Example.from_dict(doc, {"entities": ["U-ORG", "O", "U-TECHNOLOGY", "O", "U-DATE"]}) ``` -------------------------------- ### Span Input Format Source: https://github.com/explosion/spacy/blob/master/website/docs/usage/visualizers.mdx Example of the JSON structure for visualizing custom spans, specifying start and end tokens. ```json { "text": "Welcome to the Bank of China.", "spans": [ {"start_token": 3, "end_token": 6, "label": "ORG"}, {"start_token": 5, "end_token": 6, "label": "GPE"}, ], "tokens": ["Welcome", "to", "the", "Bank", "of", "China", "."] } ``` -------------------------------- ### Migrate Simple Training Style to Example Source: https://github.com/explosion/spacy/blob/master/website/docs/usage/training.mdx Illustrates the migration from the older simple training style to using the Example.from_dict method in spaCy v3.0. ```diff text = "Facebook released React in 2014" ``` -------------------------------- ### Access Predicted Document Source: https://github.com/explosion/spacy/blob/master/website/docs/api/example.mdx Get the Doc object containing the pipeline's predictions from an Example. This is sometimes referred to as `example.x`. ```python docs = [eg.predicted for eg in examples] predictions, _ = model.begin_update(docs) set_annotations(docs, predictions) ``` -------------------------------- ### Debug Model - Inspect Dimensions, Parameters, and Gradients Source: https://github.com/explosion/spacy/blob/master/website/docs/api/cli.mdx This example shows how to inspect model initialization (Step 1) and updates after a training step (Step 2). It prints layer dimensions, parameters (like the 'W' matrix and 'b' bias), and their sample values, which is crucial for verifying correct propagation and training feedback. ```bash python -m spacy debug model ./config.cfg tagger -l "5,15" -DIM -PAR -P0 -P1 -P2 ``` ```text ℹ Using CPU ℹ Fixing random seed: 0 ℹ Analysing model with ID 62 ========================= STEP 0 - before training ========================= ℹ Layer 5: model ID 60: 'softmax' ℹ - dim nO: None ℹ - dim nI: 96 ℹ - param W: None ℹ - param b: None ℹ Layer 15: model ID 40: 'residual' ℹ - dim nO: None ℹ - dim nI: None ======================= STEP 1 - after initialization ======================= ℹ Layer 5: model ID 60: 'softmax' ℹ - dim nO: 4 ℹ - dim nI: 96 ℹ - param W: (4, 96) - sample: [0. 0. 0. 0. 0.] ℹ - param b: (4,) - sample: [0. 0. 0. 0.] ℹ Layer 15: model ID 40: 'residual' ℹ - dim nO: 96 ℹ - dim nI: None ========================== STEP 2 - after training ========================== ℹ Layer 5: model ID 60: 'softmax' ℹ - dim nO: 4 ℹ - dim nI: 96 ℹ - param W: (4, 96) - sample: [ 0.00283958 -0.00294119 0.00268396 -0.00296219 -0.00297141] ℹ - param b: (4,) - sample: [0.00300002 0.00300002 0.00300002 0.00300002] ℹ Layer 15: model ID 40: 'residual' ℹ - dim nO: 96 ℹ - dim nI: None ``` -------------------------------- ### EntityLinker v2 Architecture Configuration Source: https://github.com/explosion/spacy/blob/master/website/docs/api/architectures.mdx Example configuration for the spacy.EntityLinker.v2 architecture. This setup includes a tok2vec layer defined by spacy.HashEmbedCNN.v2. ```ini [model] @architectures = "spacy.EntityLinker.v2" nO = null [model.tok2vec] @architectures = "spacy.HashEmbedCNN.v2" pretrained_vectors = null width = 96 depth = 2 embed_size = 2000 window_size = 1 maxout_pieces = 3 subword_features = true ``` -------------------------------- ### Initialize StringStore Source: https://github.com/explosion/spacy/blob/master/website/docs/api/stringstore.mdx Create a new StringStore instance and initialize it with a sequence of strings. ```python from spacy.strings import StringStore stringstore = StringStore(["apple", "orange"]) ``` -------------------------------- ### Accessing Transformer Outputs Source: https://github.com/explosion/spacy/blob/master/website/docs/api/transformer.mdx Get the last hidden layer output for a specific token. Requires the spacy-transformers library to be installed and configured. ```python # Get the last hidden layer output for "is" (token index 1) doc = nlp("This is a text.") indices = doc._.trf_data.align[1].data.flatten() last_hidden_state = doc._.trf_data.model_output.last_hidden_state dim = last_hidden_state.shape[-1] tensors = last_hidden_state.reshape(-1, dim)[indices] ``` -------------------------------- ### Get Loss with EditTreeLemmatizer Source: https://github.com/explosion/spacy/blob/master/website/docs/api/edittreelemmatizer.mdx Calculate the loss and gradient for a batch of documents and their predicted scores. Requires the batch of examples and the model's scores. ```python lemmatizer = nlp.add_pipe("trainable_lemmatizer", name="lemmatizer") scores = lemmatizer.model.begin_update([eg.predicted for eg in examples]) loss, d_loss = lemmatizer.get_loss(examples, scores) ``` -------------------------------- ### Get Token Alignment from Example Source: https://github.com/explosion/spacy/blob/master/website/docs/api/example.mdx Access the alignment object to map tokens between predicted and reference documents. This is useful for comparing token-level correspondences. ```python tokens_x = ["Apply", "some", "sunscreen"] x = Doc(vocab, words=tokens_x) tokens_y = ["Apply", "some", "sun", "screen"] example = Example.from_dict(x, {"words": tokens_y}) alignment = example.alignment assert list(alignment.y2x.data) == [[0], [1], [2], [2]] ``` -------------------------------- ### Initialize and Save Config File Source: https://github.com/explosion/spacy/blob/master/website/docs/api/cli.mdx Use this command to create a training-ready config.cfg file with recommended settings for your use case. It auto-fills default values and can be customized later. ```bash python -m spacy init config config.cfg --lang en --pipeline ner,textcat --optimize accuracy ``` -------------------------------- ### Configure spaCy Sentiment Task Source: https://github.com/explosion/spacy/blob/master/website/docs/api/large-language-models.mdx Example configuration for the spacy.Sentiment.v1 task. This setup is used to define the LLM task component within a spaCy pipeline. ```ini [components.llm.task] @llm_tasks = "spacy.Sentiment.v1" examples = null ``` -------------------------------- ### Initialize a blank English model with lookups Source: https://github.com/explosion/spacy/blob/master/website/docs/usage/v2-2.mdx When starting with a blank model and requiring lookup data for lemmatization, explicitly install spaCy with lookups and initialize the model. ```python import spacy nlp = spacy.blank("en") ``` -------------------------------- ### Install and Use spacy-huggingface-hub CLI Source: https://github.com/explosion/spacy/blob/master/website/docs/usage/v3-1.mdx Install the package, log in to Hugging Face CLI, package a spaCy model, and push it to the Hub. Ensure you are in the output directory containing the wheel file before pushing. ```bash pip install spacy-huggingface-hub huggingface-cli login python -m spacy package ./en_ner_fashion ./output --build wheel cd ./output/en_ner_fashion-0.0.0/dist python -m spacy huggingface-hub push en_ner_fashion-0.0.0-py3-none-any.whl ``` -------------------------------- ### Example: Push a specific spaCy pipeline wheel file Source: https://github.com/explosion/spacy/blob/master/website/docs/api/cli.mdx An example demonstrating how to push a specific `.whl` file for a spaCy pipeline to the Hugging Face Hub. ```bash python -m spacy huggingface-hub push en_ner_fashion-0.0.0-py3-none-any.whl ``` -------------------------------- ### Get Morphologizer Loss and Gradient Source: https://github.com/explosion/spacy/blob/master/website/docs/api/morphologizer.mdx Calculate the loss and gradient for a batch of documents based on predicted scores. Requires the batch of examples and the model's predictions. ```python morphologizer = nlp.add_pipe("morphologizer") scores = morphologizer.predict([eg.predicted for eg in examples]) loss, d_loss = morphologizer.get_loss(examples, scores) ``` -------------------------------- ### Migrate begin_training to initialize Source: https://github.com/explosion/spacy/blob/master/website/docs/usage/v3.mdx The begin_training methods have been renamed to initialize in spaCy v3. The initialize method now accepts a function returning Example objects for model setup. ```diff - nlp.begin_training() + nlp.initialize(lambda: examples) ``` -------------------------------- ### Install spaCy and Download Models Source: https://context7.com/explosion/spacy/llms.txt Install spaCy and download a trained pipeline model. Use `en_core_web_sm` for a small English model or `en_core_web_md` for a larger one with word vectors. ```bash pip install -U pip setuptools wheel pip install spacy ``` ```bash python -m spacy download en_core_web_sm ``` ```bash python -m spacy download en_core_web_md ``` -------------------------------- ### Initialize Training Configuration Source: https://github.com/explosion/spacy/blob/master/website/docs/usage/v3.mdx Generate a starter training configuration file using the spacy init config command, specifying language and pipeline components. ```bash $ python -m spacy init config ./config.cfg --lang en --pipeline tagger,parser ``` -------------------------------- ### TextCat v3 Configuration Example Source: https://github.com/explosion/spacy/blob/master/website/docs/api/large-language-models.mdx Use this configuration to define labels and their definitions for the TextCat v3 component. This setup is useful for providing context to the LLM about the nature of each label. ```ini [components.llm.task] @llm_tasks = "spacy.TextCat.v3" labels = ["COMPLIMENT", "INSULT"] [components.llm.task.label_definitions] "COMPLIMENT" = "a polite expression of praise or admiration.", "INSULT" = "a disrespectful or scornfully abusive remark or act." examples = null ``` -------------------------------- ### Access Transformer Output for a Token Source: https://github.com/explosion/spacy/blob/master/website/docs/api/curatedtransformer.mdx Retrieve the hidden state tensor for a specific token from the `Doc._.trf_data` attribute. This example shows how to get the output for the token 'is' (index 1). ```python # Get the last hidden layer output for "is" (token index 1) doc = nlp("This is a text.") tensors = doc._.trf_data.last_hidden_layer_state[1] ``` -------------------------------- ### Initialize TextCategorizer with Default and Custom Models Source: https://github.com/explosion/spacy/blob/master/website/docs/api/textcategorizer.mdx Demonstrates constructing a TextCategorizer using nlp.add_pipe with default settings and a custom configuration. Also shows direct instantiation from the class. ```python # Construction via add_pipe with default model # Use 'textcat_multilabel' for multi-label classification textcat = nlp.add_pipe("textcat") # Construction via add_pipe with custom model config = {"model": {"@architectures": "my_textcat"}} parser = nlp.add_pipe("textcat", config=config) # Construction from class # Use 'MultiLabel_TextCategorizer' for multi-label classification from spacy.pipeline import TextCategorizer textcat = TextCategorizer(nlp.vocab, model, threshold=0.5) ``` -------------------------------- ### DependencyParser Initialization Source: https://github.com/explosion/spacy/blob/master/website/docs/api/dependencyparser.mdx Demonstrates how to initialize the DependencyParser, either through nlp.add_pipe with default or custom configurations, or directly from the class. ```APIDOC ## DependencyParser.__init__ {id="init",tag="method"} ### Description Create a new pipeline instance. In your application, you would normally use a shortcut for this and instantiate the component using its string name and [`nlp.add_pipe`](/api/language#add_pipe). ### Parameters #### Path Parameters None #### Query Parameters None #### Request Body None ### Parameters - **vocab** (object) - The shared vocabulary. - **model** (object) - The [`Model`](https://thinc.ai/docs/api-model) powering the pipeline component. - **name** (string) - String name of the component instance. Used to add entries to the `losses` during training. - **moves** (object) - A list of transition names. Inferred from the data if not provided. - **update_with_oracle_cut_size** (int) - During training, cut long sequences into shorter segments by creating intermediate states based on the gold-standard history. Defaults to `100`. - **learn_tokens** (bool) - Whether to learn to merge subtokens that are split relative to the gold standard. Experimental. Defaults to `False`. - **min_action_freq** (int) - The minimum frequency of labelled actions to retain. Rarer labelled actions have their label backed-off to "dep". - **scorer** (object) - The scoring method. Defaults to [`Scorer.score_deps`](/api/scorer#score_deps) for the attribute "dep" ignoring the labels `p` and `punct` and [`Scorer.score_spans`](/api/scorer/#score_spans) for the attribute "sents". ### Request Example ```python # Construction via add_pipe with default model parser = nlp.add_pipe("parser") # Construction via add_pipe with custom model config = {"model": {"@architectures": "my_parser"}} parser = nlp.add_pipe("parser", config=config) # Construction from class from spacy.pipeline import DependencyParser parser = DependencyParser(nlp.vocab, model) ``` ### Response #### Success Response (200) None #### Response Example None ``` -------------------------------- ### Python Code Example for Project Integration Source: https://github.com/explosion/spacy/blob/master/website/UNIVERSE.md This Python code demonstrates how to load a spaCy model and add a custom pipeline component using a package. Ensure the package is installed and the model is available. ```python import spacy import package_name lp = spacy.load('en') nlp.add_pipe(package_name) ``` -------------------------------- ### Load and Use Custom Component Source: https://github.com/explosion/spacy/blob/master/website/docs/usage/saving-loading.mdx This example demonstrates loading a spaCy pipeline with the custom 'snek' component after the package has been installed. It shows that the component can be added using `nlp.add_pipe('snek')` without explicit import, and then used to process a document. ```python from spacy.lang.en import English nlp = English() nlp.add_pipe("snek") # this now works! 🐍🎉 doc = nlp("I am snek") ``` -------------------------------- ### Install and Initialize DVC Source: https://github.com/explosion/spacy/blob/master/website/docs/usage/projects.mdx Install DVC and initialize your spaCy project as a Git and DVC repository. This sets up DVC for tracking data assets. ```bash pip install dvc # Install DVC git init # Initialize a Git repo dvc init # Initialize a DVC project ``` -------------------------------- ### Analyze Pipeline Components in Python Source: https://github.com/explosion/spacy/blob/master/website/docs/usage/processing-pipelines.mdx Use nlp.analyze_pipes to inspect pipeline components. This example demonstrates adding a 'tagger' and an 'entity_linker' to a blank English pipeline and then analyzing the component configurations. Note that the 'entity_linker' has unmet requirements in this initial setup. ```python import spacy lp = spacy.blank("en") lp.add_pipe("tagger") # This is a problem because it needs entities and sentence boundaries lp.add_pipe("entity_linker") analysis = nlp.analyze_pipes(pretty=True) ``` -------------------------------- ### Debug Config Output (Valid Config and Options) Source: https://github.com/explosion/spacy/blob/master/website/docs/api/cli.mdx Example output for a valid configuration file, showing registered functions and variables. Use --show-functions and --show-variables flags to display this information. ```bash $ python -m spacy debug config ./config.cfg --show-functions --show-variables ``` ```bash ============================= Config validation ============================= ✔ Config is valid =============================== Variables (6) =============================== Variable Value ----------------------------------------- ---------------------------------- ${components.tok2vec.model.encode.width} 96 ${paths.dev} 'hello' ${paths.init_tok2vec} None ${paths.raw} None ${paths.train} '' ${system.seed} 0 ========================= Registered functions (17) ========================= ℹ [nlp.tokenizer] Registry @tokenizers Name spacy.Tokenizer.v1 Module spacy.language File /path/to/spacy/language.py (line 64) ℹ [components.ner.model] Registry @architectures Name spacy.TransitionBasedParser.v1 Module spacy.ml.models.parser File /path/to/spacy/ml/models/parser.py (line 11) ℹ [components.ner.model.tok2vec] Registry @architectures Name spacy.Tok2VecListener.v1 Module spacy.ml.models.tok2vec File /path/to/spacy/ml/models/tok2vec.py (line 16) ℹ [components.parser.model] Registry @architectures Name spacy.TransitionBasedParser.v1 Module spacy.ml.models.parser File /path/to/spacy/ml/models/parser.py (line 11) ℹ [components.parser.model.tok2vec] Registry @architectures Name spacy.Tok2VecListener.v1 Module spacy.ml.models.tok2vec File /path/to/spacy/ml/models/tok2vec.py (line 16) ℹ [components.tagger.model] Registry @architectures Name spacy.Tagger.v1 Module spacy.ml.models.tagger File /path/to/spacy/ml/models/tagger.py (line 9) ℹ [components.tagger.model.tok2vec] Registry @architectures Name spacy.Tok2VecListener.v1 Module spacy.ml.models.tok2vec File /path/to/spacy/ml/models/tok2vec.py (line 16) ℹ [components.tok2vec.model] Registry @architectures Name spacy.Tok2Vec.v1 Module spacy.ml.models.tok2vec File /path/to/spacy/ml/models/tok2vec.py (line 72) ℹ [components.tok2vec.model.embed] Registry @architectures Name spacy.MultiHashEmbed.v1 Module spacy.ml.models.tok2vec File /path/to/spacy/ml/models/tok2vec.py (line 93) ℹ [components.tok2vec.model.encode] Registry @architectures Name spacy.MaxoutWindowEncoder.v1 Module spacy.ml.models.tok2vec File /path/to/spacy/ml/models/tok2vec.py (line 207) ℹ [corpora.dev] Registry @readers Name spacy.Corpus.v1 Module spacy.training.corpus File /path/to/spacy/training/corpus.py (line 18) ℹ [corpora.train] Registry @readers Name spacy.Corpus.v1 Module spacy.training.corpus File /path/to/spacy/training/corpus.py (line 18) ℹ [training.logger] Registry @loggers Name spacy.ConsoleLogger.v1 Module spacy.training.loggers File /path/to/spacy/training/loggers.py (line 8) ℹ [training.batcher] Registry @batchers ``` -------------------------------- ### Extract Named Entities from Text Source: https://github.com/explosion/spacy/blob/master/website/docs/usage/101/_named-entities.mdx Use this Python code to load a spaCy model, process a document, and iterate through detected named entities, printing their text, start and end character positions, and labels. Ensure you have the 'en_core_web_sm' model installed. ```python import spacy lp = spacy.load("en_core_web_sm") doc = nlp("Apple is looking at buying U.K. startup for $1 billion") for ent in doc.ents: print(ent.text, ent.start_char, ent.end_char, ent.label_) ``` -------------------------------- ### Example project.yml Structure Source: https://github.com/explosion/spacy/blob/master/website/docs/usage/projects.mdx This is an example of a project.yml file, which defines project assets and commands. It is similar to CI pipeline configuration files. ```yaml %%GITHUB_PROJECTS/pipelines/tagger_parser_ud/project.yml ``` -------------------------------- ### Load Pipeline Meta Data Source: https://github.com/explosion/spacy/blob/master/website/docs/api/top-level.mdx Get a pipeline's meta.json from a file path and validate its contents. The meta data typically includes details about author, licensing, data sources, and version. Use this to retrieve metadata about installed pipelines. ```python meta = util.load_meta("/path/to/meta.json") ``` -------------------------------- ### Install spacy-huggingface-hub and Login to Hugging Face Source: https://github.com/explosion/spacy/blob/master/website/docs/api/cli.mdx Install the necessary package and log in to your Hugging Face account to enable uploading pipelines. ```bash pip install spacy-huggingface-hub huggingface-cli login ``` -------------------------------- ### NER Entity Data Structure Example Source: https://github.com/explosion/spacy/blob/master/website/docs/api/top-level.mdx This JSON structure represents a named entity within a document. It includes the text, a list of entities with their start and end character indices, and labels. Optional fields like 'title' and 'settings' can be provided for visualization. ```json { "text": "Welcome to the Bank of China.", "spans": [ { "start_token": 3, "end_token": 6, "label": "ORG" }, { "start_token": 5, "end_token": 6, "label": "GPE" } ], "tokens": ["Welcome", "to", "the", "Bank", "of", "China", "."] } ``` -------------------------------- ### Create Example from Dictionary Source: https://github.com/explosion/spacy/blob/master/website/docs/api/data-formats.mdx Use `Example.from_dict` to create a training instance from a reference Doc and a dictionary of gold-standard annotations. This method is part of the internal training API. ```python example = Example.from_dict(doc, gold_dict) ``` -------------------------------- ### Basic Span Visualization Source: https://github.com/explosion/spacy/blob/master/website/docs/usage/visualizers.mdx This example demonstrates how to use the `displacy.serve` function with the 'span' style to visualize overlapping spans in a text. It requires importing `spacy` and `displacy`, creating a blank English model, processing text, and defining spans with their start, end, and label. ```python import spacy from spacy import displacy from spacy.tokens import Span text = "Welcome to the Bank of China." nlp = spacy.blank("en") doc = nlp(text) doc.spans["sc"] = [ Span(doc, 3, 6, "ORG"), Span(doc, 5, 6, "GPE"), ] displacy.serve(doc, style="span") ``` -------------------------------- ### Create Configuration File Source: https://github.com/explosion/spacy/blob/master/examples/README.md Initializes a spaCy configuration file for NER pipeline training. ```python Running command: /home/user/venv/bin/python -m spacy init config --lang en --pipeline ner configs/config.cfg --force ℹ Generated config template specific for your use case - Language: en - Pipeline: ner - Optimize for: efficiency - Hardware: CPU - Transformer: None ✔ Auto-filled config with all values ✔ Saved config configs/config.cfg You can now add your data and train your pipeline: python -m spacy train config.cfg --paths.train ./train.spacy --paths.dev ./dev.spacy ``` -------------------------------- ### Score TextCategorizer Examples Source: https://github.com/explosion/spacy/blob/master/website/docs/api/textcategorizer.mdx Use the score method to evaluate a batch of examples with the TextCategorizer. The input 'examples' should be an iterable of Example objects. ```python scores = textcat.score(examples) ``` -------------------------------- ### Project Training Configuration Source: https://github.com/explosion/spacy/blob/master/website/docs/usage/projects.mdx Example `project.yml` configuration for a 'train' command, specifying the script, dependencies, and expected outputs. ```yaml - name: train help: 'Train a spaCy pipeline using the specified corpus and config' script: - 'spacy train ./config.cfg --output training/' deps: - 'corpus/train' - 'corpus/dev' - 'config.cfg' outputs: - 'training/model-best' ``` -------------------------------- ### Fill Configuration Defaults Source: https://github.com/explosion/spacy/blob/master/website/docs/usage/training.mdx Use `init fill-config` to populate a base configuration file with default settings. This ensures a complete and reproducible configuration for training. ```bash python -m spacy init fill-config base_config.cfg config.cfg ``` -------------------------------- ### Install spaCy with CUDA GPU Support Source: https://github.com/explosion/spacy/blob/master/website/docs/usage/index.mdx Install spaCy with GPU support by specifying the CUDA version in the pip install command. This installs CuPy for GPU array compatibility. ```bash pip install -U %%SPACY_PKG_NAME[cuda113]%%SPACY_PKG_FLAGS ``` -------------------------------- ### Few-Shot Learning Example for Summarization Source: https://github.com/explosion/spacy/blob/master/website/docs/api/large-language-models.mdx An example of how to structure few-shot learning examples for the summarization task in YAML format. ```APIDOC ## Few-Shot Summarization Example ### Description This is an example of a few-shot learning entry for the summarization task. It includes the input text and its corresponding desired summary. ### Method N/A (Data format for few-shot examples) ### Endpoint N/A ### Request Example ```yaml - text: > The United Nations, referred to informally as the UN, is an intergovernmental organization whose stated purposes are to maintain international peace and security, develop friendly relations among nations, achieve international cooperation, and serve as a centre for harmonizing the actions of nations. It is the world's largest international organization. The UN is headquartered on international territory in New York City, and the organization has other offices in Geneva, Nairobi, Vienna, and The Hague, where the International Court of Justice is headquartered.\n\n The UN was established after World War II with the aim of preventing future world wars, and succeeded the League of Nations, which was characterized as ineffective. summary: 'The UN is an international organization that promotes global peace, cooperation, and harmony. Established after WWII, its purpose is to prevent future world wars.' ``` ``` -------------------------------- ### Install spaCy with conda Source: https://github.com/explosion/spacy/blob/master/README.md Install spaCy from the conda-forge channel using the conda package manager. This is an alternative to pip installation. ```bash conda install -c conda-forge spacy ``` -------------------------------- ### Install spaCy in Editable Mode Source: https://github.com/explosion/spacy/blob/master/website/docs/usage/index.mdx Install spaCy in editable mode for development. Changes to Python files are reflected immediately, but Cython file edits require rerunning the install command. Ensure previous installs are removed. ```bash $ pip install -r requirements.txt $ pip install --no-build-isolation --editable . ``` -------------------------------- ### Serve Dependency Visualization with Options Source: https://github.com/explosion/spacy/blob/master/website/docs/api/top-level.mdx Use this to serve a dependency visualization with custom options. The `options` dictionary can include keys like `compact`, `color`, and others detailed in the table. ```python options = {"compact": True, "color": "blue"} displacy.serve(doc, style="dep", options=options) ``` -------------------------------- ### Few-Shot Lemma Examples Source: https://github.com/explosion/spacy/blob/master/website/docs/api/large-language-models.mdx YAML format for providing few-shot examples for the spacy.Lemma.v1 task. Each example includes text and its corresponding lemmas. ```yaml - text: I'm buying ice cream. lemmas: - 'I': 'I' - "'m": 'be' - 'buying': 'buy' - 'ice': 'ice' - 'cream': 'cream' - '.': '.' - text: I've watered the plants. lemmas: - 'I': 'I' - "'ve": 'have' - 'watered': 'water' - 'the': 'the' - 'plants': 'plant' - '.': '.' ``` -------------------------------- ### Install spaCy Model from Local File Source: https://github.com/explosion/spacy/blob/master/website/docs/usage/models.mdx Install a spaCy model from a local wheel file or tar.gz archive. Ensure the path to the file is correct. ```bash $ pip install /Users/you/en_core_web_sm-3.0.0-py3-none-any.whl ``` ```bash $ pip install /Users/you/en_core_web_sm-3.0.0.tar.gz ```