### Install and Configure Dependencies in Livebook/Scripts

Source: https://github.com/elixir-nx/bumblebee/blob/main/README.md

Use `Mix.install/2` in notebooks or scripts to install Bumblebee and EXLA, and configure Nx to use the EXLA backend simultaneously.

```elixir
Mix.install(
  [
    {:bumblebee, "~> 0.6.0"},
    {:exla, ">= 0.0.0"}
  ],
  config: [nx: [default_backend: EXLA.Backend]]
)
```

--------------------------------

### Start Nx.Serving as a Batched Inference Server

Source: https://context7.com/elixir-nx/bumblebee/llms.txt

Configures and starts a Bumblebee serving process as a supervised OTP process. This automatically batches concurrent requests from multiple clients to maximize throughput. Ensure the serving configuration includes batch size and sequence length if applicable.

```elixir
# In your application supervisor (application.ex)
defmodule MyApp.Application do
  use Application

  def start(_type, _args) do
    {:ok, bert} = Bumblebee.load_model({:hf, "google-bert/bert-base-uncased"})
    {:ok, tokenizer} = Bumblebee.load_tokenizer({:hf, "google-bert/bert-base-uncased"})

    serving = Bumblebee.Text.fill_mask(bert, tokenizer,
      compile: [batch_size: 8, sequence_length: 100],
      defn_options: [compiler: EXLA]
    )

    children = [
      {Nx.Serving, serving: serving, name: MyApp.Serving, batch_size: 8, batch_timeout: 100}
    ]

    Supervisor.start_link(children, strategy: :one_for_one)
  end
end

# Call from anywhere in the application (requests are automatically batched)
Nx.Serving.batched_run(MyApp.Serving, "The [MASK] sat on the mat.")
#=> %{predictions: [%{token: "cat", score: 0.87}, ...]})

```

--------------------------------

### Nx.Serving as a Production Process

Source: https://context7.com/elixir-nx/bumblebee/llms.txt

Starts a supervised process that automatically batches concurrent requests from multiple clients, maximizing throughput on CPU or GPU.

```APIDOC
## `Nx.Serving` as a Production Process — Batched inference server

Any Bumblebee serving can be started as a supervised process that automatically batches concurrent requests from multiple clients, maximising throughput on CPU or GPU.

```elixir
# In your application supervisor (application.ex)
defmodule MyApp.Application do
  use Application

  def start(_type, _args) do
    {:ok, bert} = Bumblebee.load_model({:hf, "google-bert/bert-base-uncased"})
    {:ok, tokenizer} = Bumblebee.load_tokenizer({:hf, "google-bert/bert-base-uncased"})

    serving = Bumblebee.Text.fill_mask(bert, tokenizer,
      compile: [batch_size: 8, sequence_length: 100],
      defn_options: [compiler: EXLA]
    )

    children = [
      {Nx.Serving, serving: serving, name: MyApp.Serving, batch_size: 8, batch_timeout: 100}
    ]

    Supervisor.start_link(children, strategy: :one_for_one)
  end
end

# Call from anywhere in the application (requests are automatically batched)
Nx.Serving.batched_run(MyApp.Serving, "The [MASK] sat on the mat.")
#=> %{predictions: [%{token: "cat", score: 0.87}, ...]})
```
```

--------------------------------

### Compile Bumblebee Serving with EXLA

Source: https://github.com/elixir-nx/bumblebee/blob/main/examples/phoenix/README.md

Example of creating a Bumblebee serving instance configured to use EXLA as the compiler for upfront computation compilation. This ensures efficient use of the GPU for neural network models.

```elixir
serving = 
  Bumblebee.Text.text_embedding(model_info, tokenizer, 
    compile: [batch_size: 1, sequence_length: 512],
    defn_options: [compiler: EXLA]
  )
```

--------------------------------

### Load Bumblebee Models

Source: https://github.com/elixir-nx/bumblebee/blob/main/examples/phoenix/README.md

Example of loading multiple Bumblebee models from Hugging Face. This function can be called during application startup or deployment to pre-cache models.

```elixir
def load_all do
  Bumblebee.load_xyz({:hf, "microsoft/resnet"})
  Bumblebee.load_xyz({:hf, "foo/bar/baz"})
end
```

--------------------------------

### Perform Text Fill-Mask Task with Bumblebee

Source: https://github.com/elixir-nx/bumblebee/blob/main/README.md

Load a pre-trained BERT model and tokenizer from Hugging Face Hub, create a text fill-mask serving pipeline, and use it to predict masked words in a sentence. This example demonstrates a common NLP task using Bumblebee's high-level APIs.

```elixir
{:ok, model_info} = Bumblebee.load_model({:hf, "google-bert/bert-base-uncased"})
{:ok, tokenizer} = Bumblebee.load_tokenizer({:hf, "google-bert/bert-base-uncased"})

serving = Bumblebee.Text.fill_mask(model_info, tokenizer)
Nx.Serving.run(serving, "The capital of [MASK] is Paris.")
#=> %{
#=>   predictions: [
#=>     %{score: 0.9279842972755432, token: "france"},
#=>     %{score: 0.008412551134824753, token: "brittany"},
#=>     %{score: 0.007433671969920397, token: "algeria"},
#=>     %{score: 0.004957548808306456, token: "department"},
#=>     %{score: 0.004369721747934818, token: "reunion"}
#=>   ]
#=> }
```

--------------------------------

### Bumblebee.load_tokenizer/2

Source: https://context7.com/elixir-nx/bumblebee/llms.txt

Loads a tokenizer from a specified repository, downloading and building a fast (Rust-based) tokenizer from `tokenizer.json`. The tokenizer type is inferred from the model's `config.json`.

```APIDOC
## Bumblebee.load_tokenizer/2

### Description
Loads a tokenizer from a repository. Downloads and builds a fast (Rust-based) tokenizer from `tokenizer.json`. The tokenizer type is inferred from the model's `config.json`.

### Method
`Bumblebee.load_tokenizer/2`

### Parameters
#### Path Parameters
- `source` (tuple): Specifies the source of the tokenizer. Can be `{:hf, "repository_name"}` for Hugging Face Hub or `{:local, "/path/to/tokenizer/dir"}` for a local directory.
- `opts` (keyword list): Optional keyword list for configuration.

### Request Example
```elixir
{:ok, tokenizer} = Bumblebee.load_tokenizer({:hf, "google-bert/bert-base-uncased"})

# Apply tokenizer directly
inputs = Bumblebee.apply_tokenizer(tokenizer, "The capital of France is Paris.")
# inputs => %{"input_ids" => #Nx.Tensor<...>, "attention_mask" => #Nx.Tensor<...>, ...}

# Tokenize a batch
inputs = Bumblebee.apply_tokenizer(tokenizer, [
  "Hello world",
  "Elixir is great"
])

# Configure tokenizer options via Bumblebee.configure/2
tokenizer = Bumblebee.configure(tokenizer, length: 128)
```

### Response
#### Success Response
Returns `{:ok, tokenizer}` where `tokenizer` is the loaded tokenizer object.

#### Response Example
```elixir
# A tokenizer struct or map representing the loaded tokenizer
%Bumblebee.Tokenizer{...}
```
```

--------------------------------

### Configure Nx to Use EXLA Backend

Source: https://github.com/elixir-nx/bumblebee/blob/main/README.md

Configure Nx to use the EXLA backend by default in your `config/config.exs` file. This ensures that models are compiled and run using EXLA for better performance.

```elixir
import Config

config :nx, default_backend: EXLA.Backend
```

--------------------------------

### Configure Nx Default Backend

Source: https://github.com/elixir-nx/bumblebee/blob/main/examples/phoenix/README.md

Configuration to set the default Nx backend to use the CPU for one-off operations, ensuring that the GPU is reserved for large computations.

```elixir
config :nx, :default_backend, {EXLA.Backend, client: :host}
```

--------------------------------

### Load Tokenizer with Bumblebee

Source: https://context7.com/elixir-nx/bumblebee/llms.txt

Use `Bumblebee.load_tokenizer/2` to download and build a fast tokenizer. The tokenizer type is inferred from the model's `config.json`. You can apply the tokenizer directly or configure its options.

```elixir
{:ok, tokenizer} = Bumblebee.load_tokenizer({:hf, "google-bert/bert-base-uncased"})
```

```elixir
# Apply tokenizer directly
inputs = Bumblebee.apply_tokenizer(tokenizer, "The capital of France is Paris.")
# inputs => %{"input_ids" => #Nx.Tensor<...>, "attention_mask" => #Nx.Tensor<...>, ...}
```

```elixir
# Tokenize a batch
inputs = Bumblebee.apply_tokenizer(tokenizer, [
  "Hello world",
  "Elixir is great"
])
```

```elixir
# Configure tokenizer options via Bumblebee.configure/2
tokenizer = Bumblebee.configure(tokenizer, length: 128)
```

--------------------------------

### Add Bumblebee and EXLA Dependencies

Source: https://github.com/elixir-nx/bumblebee/blob/main/README.md

Add Bumblebee and the optional EXLA dependency to your `mix.exs` file. EXLA is recommended for just-in-time model compilation and optimized CPU/GPU performance.

```elixir
def deps do
  [
    {:bumblebee, "~> 0.6.0"},
    {:exla, ">= 0.0.0"}
  ]
end
```

--------------------------------

### Create Local Model Checkpoints with Python

Source: https://github.com/elixir-nx/bumblebee/blob/main/AGENTS.md

This Python script generates local checkpoints for various SmolLM3 model types using a specified configuration. It saves these checkpoints to be used for testing.

```python
from transformers import SmolLM3Config, SmolLM3Model, SmolLM3ForCausalLM, SmolLM3ForQuestionAnswering, SmolLM3ForSequenceClassification, SmolLM3ForTokenClassification

config = SmolLM3Config(
  vocab_size=1024,
  hidden_size=32,
  num_hidden_layers=2,
  num_attention_heads=4,
  intermediate_size=37,
  hidden_act="gelu",
  hidden_dropout_prob=0.1,
  attention_probs_dropout_prob=0.1,
  max_position_embeddings=512,
  type_vocab_size=16,
  is_decoder=False,
  initializer_range=0.02,
  pad_token_id=0,
  no_rope_layers=[0, 1]
)

for c in [SmolLM3Model, SmolLM3ForCausalLM, SmolLM3ForQuestionAnswering, SmolLM3ForSequenceClassification, SmolLM3ForTokenClassification]:
  name = c.__name__
  c(config).save_pretrained(f"bumblebee-testing/tiny-random-{name}", repo_id=f"bumblebee-testing/tiny-random-{name}")
```

--------------------------------

### Load Generation Configuration with Bumblebee

Source: https://context7.com/elixir-nx/bumblebee/llms.txt

Use `Bumblebee.load_generation_config/2` to load sampling and decoding configuration for text generation models. Generation parameters can be adjusted using `Bumblebee.configure/2`.

```elixir
{:ok, generation_config} = Bumblebee.load_generation_config({:hf, "openai-community/gpt2"})
```

```elixir
# Adjust generation parameters
generation_config = Bumblebee.configure(generation_config,
  max_new_tokens: 100,
  min_new_tokens: 10
)
```

--------------------------------

### Bumblebee.load_generation_config/2

Source: https://context7.com/elixir-nx/bumblebee/llms.txt

Loads the generation configuration for text generation models, which includes sampling and decoding parameters.

```APIDOC
## Bumblebee.load_generation_config/2

### Description
Loads sampling and decoding configuration for text generation models.

### Method
`Bumblebee.load_generation_config/2`

### Parameters
#### Path Parameters
- `source` (tuple): Specifies the source of the generation configuration. Can be `{:hf, "repository_name"}` for Hugging Face Hub or `{:local, "/path/to/config/dir"}` for a local directory.
- `opts` (keyword list): Optional keyword list for configuration.

### Request Example
```elixir
{:ok, generation_config} = Bumblebee.load_generation_config({:hf, "openai-community/gpt2"})

# Adjust generation parameters
generation_config = Bumblebee.configure(generation_config,
  max_new_tokens: 100,
  min_new_tokens: 10
)
```

### Response
#### Success Response
Returns `{:ok, generation_config}` where `generation_config` is the loaded generation configuration map or struct.

#### Response Example
```elixir
%{max_new_tokens: 100, min_new_tokens: 10, ...}
```
```

--------------------------------

### Load Pre-trained Model with Bumblebee

Source: https://context7.com/elixir-nx/bumblebee/llms.txt

Use `Bumblebee.load_model/2` to download and cache model parameters and configuration. Supports loading from Hugging Face Hub or local directories, with options for architecture override, custom parameter types, and specific subdirectories.

```elixir
# Load BERT from Hugging Face (auto-infers architecture)
{:ok, bert} = Bumblebee.load_model({:hf, "google-bert/bert-base-uncased"})
%{model: model, params: params, spec: spec} = bert
```

```elixir
# Load with explicit architecture
{:ok, resnet} = Bumblebee.load_model({:hf, "microsoft/resnet-50"}, architecture: :base)
```

```elixir
# Load with bfloat16 precision for GPU efficiency
{:ok, llama} = Bumblebee.load_model({:hf, "meta-llama/Llama-2-7b-hf"}, type: :bf16)
```

```elixir
# Load a specific subdir (e.g., for Stable Diffusion components)
{:ok, unet} = Bumblebee.load_model({:hf, "CompVis/stable-diffusion-v1-4", subdir: "unet"})
```

```elixir
# Load from local directory
{:ok, model_info} = Bumblebee.load_model({:local, "/path/to/model/dir"})
```

```elixir
# Customise spec at load time
{:ok, resnet} = 
  Bumblebee.load_model({:hf, "microsoft/resnet-50"}, spec_overrides: [num_labels: 10])
```

--------------------------------

### Load Featurizer with Bumblebee

Source: https://context7.com/elixir-nx/bumblebee/llms.txt

Use `Bumblebee.load_featurizer/2` to load a featurizer for preprocessing images or audio into model-compatible tensors. Apply the featurizer to input data like images or audio files.

```elixir
{:ok, featurizer} = Bumblebee.load_featurizer({:hf, "microsoft/resnet-50"})
```

```elixir
# Apply featurizer to an image
{:ok, img} = StbImage.read_file("/path/to/image.jpg")
inputs = Bumblebee.apply_featurizer(featurizer, [img])
# inputs => %{"pixel_values" => #Nx.Tensor<f32[1][3][224][224] ...>}
```

```elixir
# Whisper audio featurizer
{:ok, featurizer} = Bumblebee.load_featurizer({:hf, "openai/whisper-tiny"})
```

--------------------------------

### Bumblebee.load_featurizer/2

Source: https://context7.com/elixir-nx/bumblebee/llms.txt

Loads a featurizer, which is used for preprocessing images or audio into tensors compatible with machine learning models.

```APIDOC
## Bumblebee.load_featurizer/2

### Description
Loads a featurizer (image/audio preprocessor) for preprocessing images or audio into model-compatible tensors.

### Method
`Bumblebee.load_featurizer/2`

### Parameters
#### Path Parameters
- `source` (tuple): Specifies the source of the featurizer. Can be `{:hf, "repository_name"}` for Hugging Face Hub or `{:local, "/path/to/featurizer/dir"}` for a local directory.
- `opts` (keyword list): Optional keyword list for configuration.

### Request Example
```elixir
{:ok, featurizer} = Bumblebee.load_featurizer({:hf, "microsoft/resnet-50"})

# Apply featurizer to an image
{:ok, img} = StbImage.read_file("/path/to/image.jpg")
inputs = Bumblebee.apply_featurizer(featurizer, [img])
# inputs => %{"pixel_values" => #Nx.Tensor<f32[1][3][224][224] ...>}

# Whisper audio featurizer
{:ok, featurizer} = Bumblebee.load_featurizer({:hf, "openai/whisper-tiny"})
```

### Response
#### Success Response
Returns `{:ok, featurizer}` where `featurizer` is the loaded featurizer object.

#### Response Example
```elixir
# A featurizer struct or map representing the loaded featurizer
%Bumblebee.Featurizer{...}
```
```

--------------------------------

### Generate Reference Values with Python

Source: https://github.com/elixir-nx/bumblebee/blob/main/AGENTS.md

Use this Python script to obtain reference output values from Hugging Face Transformers models for testing purposes. It prints the shape and a slice of the last hidden state.

```python
from transformers import BertModel
import torch

model = BertModel.from_pretrained("hf-internal-testing/tiny-random-BertModel")

inputs = {
  "input_ids": torch.tensor([[10, 20, 30, 40, 50, 60, 70, 80, 0, 0]]),
  "attention_mask": torch.tensor([[1, 1, 1, 1, 1, 1, 1, 1, 0, 0]])
}

outputs = model(**inputs)

print(outputs.last_hidden_state.shape)
print(outputs.last_hidden_state[:, 1:4, 1:4])

#=> torch.Size([1, 10, 32])
#=> tensor([[[-0.2331,  1.7817,  1.1736],
#=>          [-1.1001,  1.3922, -0.3391],
#=>          [ 0.0408,  0.8677, -0.0779]]], grad_fn=<SliceBackward0>)
```

--------------------------------

### Generate Image Captions with BLIP

Source: https://context7.com/elixir-nx/bumblebee/llms.txt

Use this function to generate natural language descriptions for images. Ensure BLIP model, featurizer, tokenizer, and generation config are loaded.

```elixir
{:ok, blip} = Bumblebee.load_model({:hf, "Salesforce/blip-image-captioning-base"})
{:ok, featurizer} = Bumblebee.load_featurizer({:hf, "Salesforce/blip-image-captioning-base"})
{:ok, tokenizer} = Bumblebee.load_tokenizer({:hf, "Salesforce/blip-image-captioning-base"})
{:ok, generation_config} = Bumblebee.load_generation_config({:hf, "Salesforce/blip-image-captioning-base"})

serving =
  Bumblebee.Vision.image_to_text(blip, featurizer, tokenizer, generation_config,
    defn_options: [compiler: EXLA]
  )

image = StbImage.read_file!("/path/to/cat_on_chair.jpg")
Nx.Serving.run(serving, image)
#=> %{results: [%{text: "a cat sitting on a chair"}]}
```

--------------------------------

### Bumblebee.configure/2

Source: https://context7.com/elixir-nx/bumblebee/llms.txt

A versatile function to build or update configuration structs for various Bumblebee components, including model specs, featurizers, tokenizers, and generation configurations.

```APIDOC
## Bumblebee.configure/2

### Description
Builds or updates a configuration struct. This function is used to create new configurations or modify existing ones for model specs, featurizers, schedulers, tokenizers, or generation configs with custom options.

### Method
`Bumblebee.configure/2`

### Parameters
#### Path Parameters
- `target` (module or struct): The target configuration struct or module to build/update (e.g., `Bumblebee.Vision.ResNet`, an existing `tokenizer`, or `generation_config`).
- `opts` (keyword list): Keyword list of options to set or update in the configuration.

### Request Example
```elixir
# Build a model spec from scratch
spec = Bumblebee.configure(Bumblebee.Vision.ResNet,
  architecture: :for_image_classification,
  num_labels: 200
)

# Update an existing config
featurizer = Bumblebee.configure(Bumblebee.Vision.ConvNextFeaturizer, resize_method: :bilinear)

# Update generation config
generation_config = Bumblebee.configure(generation_config,
  max_new_tokens: 50,
  min_new_tokens: 5
)

# Build model from spec
model = Bumblebee.build_model(spec)
```

### Response
#### Success Response
Returns the updated or newly created configuration struct.

#### Response Example
```elixir
# Depending on the target, returns a configured struct or map
%Bumblebee.Spec{...}
# or
%Bumblebee.Tokenizer{...}
# or
%{max_new_tokens: 50, min_new_tokens: 5, ...}
```
```

--------------------------------

### Text Generation with Bumblebee

Source: https://context7.com/elixir-nx/bumblebee/llms.txt

Generate text continuations from a prompt using `Bumblebee.Text.generation/4`. Supports streaming output for real-time applications. Load model, tokenizer, and generation configuration.

```elixir
 {:ok, model_info} = Bumblebee.load_model({:hf, "openai-community/gpt2"})
 {:ok, tokenizer} = Bumblebee.load_tokenizer({:hf, "openai-community/gpt2"})
 {:ok, generation_config} = Bumblebee.load_generation_config({:hf, "openai-community/gpt2"})
 generation_config = Bumblebee.configure(generation_config, max_new_tokens: 15)

 # Standard batch generation
 serving = Bumblebee.Text.generation(model_info, tokenizer, generation_config)

 Nx.Serving.run(serving, "Elixir is a functional")
 #=> %{
 #=>   results: [
 #=>     %{text: " programming language that is designed to be used in a variety of applications. It", token_summary: %{input: 5, output: 15, padding: 0}}
 #=>   ]
 #=> }

 # Streaming generation - returns a lazy stream of text chunks
 serving = Bumblebee.Text.generation(model_info, tokenizer, generation_config, stream: true)

 Nx.Serving.run(serving, "Elixir is a functional") |> Enum.to_list()
 #=> [" programming", " language", " that", " is", " designed", " to", " be", " used",
 #=>  " in", " a", " variety", " of", " applications.", " It"]
```

--------------------------------

### Bumblebee.load_model/2

Source: https://context7.com/elixir-nx/bumblebee/llms.txt

Loads a pre-trained model from Hugging Face Hub or a local directory. It downloads and caches model parameters and configuration, returning a map containing the model, parameters, and specification. Supports architecture overrides, custom parameter types, and custom backends.

```APIDOC
## Bumblebee.load_model/2

### Description
Loads a pre-trained model from Hugging Face Hub or local disk. Returns a `model_info` map with `:model`, `:params`, and `:spec` keys. Supports architecture override, custom parameter types (e.g., `:bf16`), and custom backends.

### Method
`Bumblebee.load_model/2`

### Parameters
#### Path Parameters
- `source` (tuple): Specifies the source of the model. Can be `{:hf, "repository_name"}` for Hugging Face Hub or `{:local, "/path/to/model/dir"}` for a local directory. For Hugging Face sources, an optional `subdir` key can be provided within the tuple.
- `opts` (keyword list): Optional keyword list for configuration. Supported options include `architecture`, `type`, `spec_overrides`.

### Request Example
```elixir
# Load BERT from Hugging Face (auto-infers architecture)
{:ok, bert} = Bumblebee.load_model({:hf, "google-bert/bert-base-uncased"})
%{model: model, params: params, spec: spec} = bert

# Load with explicit architecture
{:ok, resnet} = Bumblebee.load_model({:hf, "microsoft/resnet-50"}, architecture: :base)

# Load with bfloat16 precision for GPU efficiency
{:ok, llama} = Bumblebee.load_model({:hf, "meta-llama/Llama-2-7b-hf"}, type: :bf16)

# Load a specific subdir (e.g., for Stable Diffusion components)
{:ok, unet} = Bumblebee.load_model({:hf, "CompVis/stable-diffusion-v1-4", subdir: "unet"})

# Load from local directory
{:ok, model_info} = Bumblebee.load_model({:local, "/path/to/model/dir"})

# Customise spec at load time
{:ok, resnet} = Bumblebee.load_model({:hf, "microsoft/resnet-50"}, spec_overrides: [num_labels: 10])
```

### Response
#### Success Response
Returns `{:ok, model_info}` where `model_info` is a map containing:
- `:model`: The loaded model.
- `:params`: The model parameters.
- `:spec`: The model specification.

#### Response Example
```elixir
%{model: %AxonLayer{}, params: %{...}, spec: %Bumblebee.Spec{...}}
```
```

--------------------------------

### Load a Diffusion Scheduler with Bumblebee

Source: https://context7.com/elixir-nx/bumblebee/llms.txt

Loads a noise scheduler (DDIM, PNDM, or LCM) for controlling the denoising process in diffusion models. Schedulers can be initialized and stepped through using provided functions.

```elixir
 {:ok, scheduler} = 
  Bumblebee.load_scheduler({:hf, "CompVis/stable-diffusion-v1-4", subdir: "scheduler"})

# Schedulers can also be used directly
{state, timesteps} = Bumblebee.scheduler_init(scheduler, 50, sample_template, prng_key)
{state, prev_sample} = Bumblebee.scheduler_step(scheduler, state, sample, prediction)
```

--------------------------------

### Configure Bumblebee Components

Source: https://context7.com/elixir-nx/bumblebee/llms.txt

Use `Bumblebee.configure/2` to build or update configuration structs for model specs, featurizers, schedulers, tokenizers, or generation configs. This function allows for customization of various parameters.

```elixir
# Build a model spec from scratch
spec = Bumblebee.configure(Bumblebee.Vision.ResNet,
  architecture: :for_image_classification,
  num_labels: 200
)
```

```elixir
# Update an existing config
featurizer = Bumblebee.configure(Bumblebee.Vision.ConvNextFeaturizer, resize_method: :bilinear)
```

```elixir
# Update generation config
generation_config = Bumblebee.configure(generation_config,
  max_new_tokens: 50,
  min_new_tokens: 5
)
```

```elixir
# Build model from spec
model = Bumblebee.build_model(spec)
```

--------------------------------

### Load Model with Subdirectory

Source: https://github.com/elixir-nx/bumblebee/blob/main/README.md

When a repository contains multiple models, specify the subdirectory containing the desired model.

```elixir
Bumblebee.load_model({:hf, "model-repo", subdir: "..."})
```

--------------------------------

### Qwen3 Text Reranking with Bumblebee

Source: https://context7.com/elixir-nx/bumblebee/llms.txt

Perform text reranking using Qwen3 instruction-tuned models to score query-document relevance. Outputs normalized relevance probabilities and supports batching.

```elixir
 {:ok, model_info} = Bumblebee.load_model({:hf, "Qwen/Qwen3-Reranker-0.6B"})
 {:ok, tokenizer} = Bumblebee.load_tokenizer({:hf, "Qwen/Qwen3-Reranker-0.6B"})

 serving = Bumblebee.Text.text_reranking_qwen3(model_info, tokenizer,
  compile: [batch_size: 4, sequence_length: 512],
  defn_options: [compiler: EXLA]
)

query = "What is the capital of France?"
documents = [
  "Paris is the capital of France.",
  "Berlin is the capital of Germany."
]

pairs = Enum.map(documents, &{query, &1})
Nx.Serving.run(serving, pairs)
#=> %{scores: [%{score: 0.98, query: "...", document: "Paris is..."}, ...]}
```

--------------------------------

### Generate Images from Text with Stable Diffusion

Source: https://context7.com/elixir-nx/bumblebee/llms.txt

Generate images from text prompts using Stable Diffusion. Requires loading CLIP text encoder, UNet, VAE decoder, scheduler, and optionally a safety checker. Supports negative prompts and fixed seeds for reproducibility.

```elixir
repository_id = "CompVis/stable-diffusion-v1-4"

{:ok, tokenizer} = Bumblebee.load_tokenizer({:hf, "openai/clip-vit-large-patch14"})
{:ok, clip}      = Bumblebee.load_model({:hf, repository_id, subdir: "text_encoder"})
{:ok, unet}      = Bumblebee.load_model({:hf, repository_id, subdir: "unet"})
{:ok, vae}       = Bumblebee.load_model({:hf, repository_id, subdir: "vae"}, architecture: :decoder)
{:ok, scheduler} = Bumblebee.load_scheduler({:hf, repository_id, subdir: "scheduler"})
{:ok, featurizer}     = Bumblebee.load_featurizer({:hf, repository_id, subdir: "feature_extractor"})
{:ok, safety_checker} = Bumblebee.load_model({:hf, repository_id, subdir: "safety_checker"})

serving = 
  Bumblebee.Diffusion.StableDiffusion.text_to_image(clip, unet, vae, tokenizer, scheduler,
    num_steps: 20,
    num_images_per_prompt: 2,
    guidance_scale: 7.5,
    safety_checker: safety_checker,
    safety_checker_featurizer: featurizer,
    compile: [batch_size: 1, sequence_length: 60],
    defn_options: [compiler: EXLA]
  )

Nx.Serving.run(serving, "numbat in forest, detailed, digital art")
#=> %{
#=>   results: [
#=>     %{image: #Nx.Tensor<u8[512][512][3] ...>, is_safe: true},
#=>     %{image: #Nx.Tensor<u8[512][512][3] ...>, is_safe: true}
#=>   ]
#=> }

# With negative prompt and fixed seed for reproducibility
Nx.Serving.run(serving, %{
  prompt: "a serene mountain lake at sunset",
  negative_prompt: "ugly, blurry, low quality",
  seed: 42
})
```

--------------------------------

### Load Model from Hugging Face Hub

Source: https://github.com/elixir-nx/bumblebee/blob/main/README.md

Use this function to load a model from the Hugging Face Hub. Ensure the model type is implemented in Bumblebee.

```elixir
Bumblebee.load_model({:hf, "model-repo"})
```

--------------------------------

### Configure Progress Bar Step

Source: https://github.com/elixir-nx/bumblebee/blob/main/README.md

Update the progress bar's update frequency. Set to 10 to update every 10% instead of every 1%.

```elixir
config :bumblebee, :progress_bar_step, 10
```

--------------------------------

### Bumblebee.load_scheduler/2

Source: https://context7.com/elixir-nx/bumblebee/llms.txt

Loads a noise scheduler (DDIM, PNDM, or LCM) used to control the denoising process in diffusion models.

```APIDOC
## `Bumblebee.load_scheduler/2` — Load a diffusion scheduler

Loads a noise scheduler (DDIM, PNDM, or LCM) used to control the denoising process in diffusion models.

```elixir
{:ok, scheduler} = Bumblebee.load_scheduler({:hf, "CompVis/stable-diffusion-v1-4", subdir: "scheduler"})

# Schedulers can also be used directly
{state, timesteps} = Bumblebee.scheduler_init(scheduler, 50, sample_template, prng_key)
{state, prev_sample} = Bumblebee.scheduler_step(scheduler, state, sample, prediction)
```
```

--------------------------------

### Bumblebee.Vision.image_to_text/5

Source: https://context7.com/elixir-nx/bumblebee/llms.txt

Generates natural language descriptions of images using multimodal models like BLIP. This function takes the loaded model, featurizer, tokenizer, and generation configuration as arguments.

```APIDOC
## Bumblebee.Vision.image_to_text/5 

### Description
Generates natural language descriptions of images using multimodal models such as BLIP.

### Function Signature
`Bumblebee.Vision.image_to_text(blip, featurizer, tokenizer, generation_config, opts \\ [])`

### Parameters
- `blip`: The loaded BLIP model.
- `featurizer`: The featurizer for the model.
- `tokenizer`: The tokenizer for the model.
- `generation_config`: The generation configuration.
- `opts`: Optional arguments, such as `defn_options`.

### Request Example
```elixir
{:ok, blip} = Bumblebee.load_model({:hf, "Salesforce/blip-image-captioning-base"})
{:ok, featurizer} = Bumblebee.load_featurizer({:hf, "Salesforce/blip-image-captioning-base"})
{:ok, tokenizer} = Bumblebee.load_tokenizer({:hf, "Salesforce/blip-image-captioning-base"})
{:ok, generation_config} = Bumblebee.load_generation_config({:hf, "Salesforce/blip-image-captioning-base"})

serving = Bumblebee.Vision.image_to_text(blip, featurizer, tokenizer, generation_config, defn_options: [compiler: EXLA])
image = StbImage.read_file!("/path/to/cat_on_chair.jpg")
Nx.Serving.run(serving, image)
```

### Response Example
```elixir
%{results: [%{text: "a cat sitting on a chair"}]}
```
```

--------------------------------

### Bumblebee.Text.text_reranking_qwen3/3

Source: https://context7.com/elixir-nx/bumblebee/llms.txt

Scores query-document relevance using Qwen3 instruction-tuned reranker models, outputting normalized relevance probabilities.

```APIDOC
## Bumblebee.Text.text_reranking_qwen3/3 - Text reranking with Qwen3 reranker models

### Description
Scores query-document relevance using Qwen3 instruction-tuned reranker models, outputting normalized relevance probabilities.

### Method Signature
`Bumblebee.Text.text_reranking_qwen3(model_info, tokenizer, opts)`

### Parameters
- `model_info`: Loaded model information.
- `tokenizer`: Loaded tokenizer.
- `opts`: Compilation and definition options, e.g., `compile: [batch_size: 4, sequence_length: 512]`, `defn_options: [compiler: EXLA]`.

### Request Example
```elixir
{:ok, model_info} = Bumblebee.load_model({:hf, "Qwen/Qwen3-Reranker-0.6B"})
{:ok, tokenizer} = Bumblebee.load_tokenizer({:hf, "Qwen/Qwen3-Reranker-0.6B"})

serving = Bumblebee.Text.text_reranking_qwen3(model_info, tokenizer,
  compile: [batch_size: 4, sequence_length: 512],
  defn_options: [compiler: EXLA]
)

query = "What is the capital of France?"
documents = [
  "Paris is the capital of France.",
  "Berlin is the capital of Germany."
]

pairs = Enum.map(documents, &{query, &1})
Nx.Serving.run(serving, pairs)
```

### Response Example
```elixir
%{scores: [%{score: 0.98, query: "...", document: "Paris is..."}, ...]}`
```
```

--------------------------------

### Bumblebee.Text.generation/4

Source: https://context7.com/elixir-nx/bumblebee/llms.txt

Generates text continuations from a given prompt. This function supports both standard batch generation and streaming output for real-time applications.

```APIDOC
## `Bumblebee.Text.generation/4` — Prompt-driven text generation

Generates text continuations from a prompt. Supports streaming output for real-time use cases.

### Standard Generation

```elixir
{:ok, model_info} = Bumblebee.load_model({:hf, "openai-community/gpt2"})
{:ok, tokenizer} = Bumblebee.load_tokenizer({:hf, "openai-community/gpt2"})
{:ok, generation_config} = Bumblebee.load_generation_config({:hf, "openai-community/gpt2"})
generation_config = Bumblebee.configure(generation_config, max_new_tokens: 15)

serving = Bumblebee.Text.generation(model_info, tokenizer, generation_config)

Nx.Serving.run(serving, "Elixir is a functional")
#=> %{
#=>   results: [
#=>     %{text: " programming language that is designed to be used in a variety of applications. It", token_summary: %{input: 5, output: 15, padding: 0}}
#=>   ]
#=> }
```

### Streaming Generation

```elixir
# Streaming generation - returns a lazy stream of text chunks
serving = Bumblebee.Text.generation(model_info, tokenizer, generation_config, stream: true)

Nx.Serving.run(serving, "Elixir is a functional") |> Enum.to_list()
#=> [" programming", " language", " that", " is", " designed", " to", " be", " used",
#=>  " in", " a", " variety", " of", " applications.", " It"]
```
```

--------------------------------

### Load Tokenizer with Revision

Source: https://github.com/elixir-nx/bumblebee/blob/main/README.md

Load a tokenizer specifying a particular revision, useful when using a generated tokenizer.json from a PR.

```elixir
Bumblebee.load_tokenizer({:hf, "model-repo", revision: "..."})
```

--------------------------------

### Bumblebee.Text.zero_shot_classification/4

Source: https://context7.com/elixir-nx/bumblebee/llms.txt

Classifies text into arbitrary user-supplied labels without any task-specific fine-tuning, using natural language inference.

```APIDOC
## Bumblebee.Text.zero_shot_classification/4 - Zero-shot classification

### Description
Classifies text into arbitrary user-supplied labels without any task-specific fine-tuning, using natural language inference.

### Method Signature
`Bumblebee.Text.zero_shot_classification(model, tokenizer, labels)`

### Parameters
- `model`: A loaded zero-shot classification model.
- `tokenizer`: A loaded tokenizer.
- `labels`: A list of arbitrary labels to classify the text into.

### Request Example
```elixir
{:ok, model} = Bumblebee.load_model({:hf, "facebook/bart-large-mnli"})
{:ok, tokenizer} = Bumblebee.load_tokenizer({:hf, "facebook/bart-large-mnli"})

labels = ["cooking", "traveling", "dancing"]
serving = Bumblebee.Text.zero_shot_classification(model, tokenizer, labels)

Nx.Serving.run(serving, "One day I will see the world")
```

### Response Example
```elixir
%{predictions: [
  %{label: "cooking", score: 0.0070497458800673485},
  %{label: "traveling", score: 0.985000491142273},
  %{label: "dancing", score: 0.007949736900627613}
]}
```
```

--------------------------------

### Zero-Shot Text Classification with Bumblebee

Source: https://context7.com/elixir-nx/bumblebee/llms.txt

Classify text into user-defined labels without fine-tuning, leveraging natural language inference. Requires a model trained on NLI tasks and a list of target labels.

```elixir
 {:ok, model} = Bumblebee.load_model({:hf, "facebook/bart-large-mnli"})
 {:ok, tokenizer} = Bumblebee.load_tokenizer({:hf, "facebook/bart-large-mnli"})

 labels = ["cooking", "traveling", "dancing"]
 serving = Bumblebee.Text.zero_shot_classification(model, tokenizer, labels)

 Nx.Serving.run(serving, "One day I will see the world")
#=> %{
#=>   predictions: [
#=>     %{label: "cooking", score: 0.0070497458800673485},
#=>     %{label: "traveling", score: 0.985000491142273},
#=>     %{label: "dancing", score: 0.007949736900627613}
#=>   ]
#=> }
```

--------------------------------

### Bumblebee.Diffusion.StableDiffusion.text_to_image/6

Source: https://context7.com/elixir-nx/bumblebee/llms.txt

Generates images from text prompts using Stable Diffusion. This function requires multiple submodels including CLIP text encoder, UNet, VAE decoder, and a noise scheduler, with optional safety checker integration.

```APIDOC
## Bumblebee.Diffusion.StableDiffusion.text_to_image/6 

### Description
Generates images from text prompts using Stable Diffusion. Requires loading multiple submodels: CLIP text encoder, UNet, VAE decoder, and a noise scheduler. Optionally integrates a safety checker.

### Function Signature
`Bumblebee.Diffusion.StableDiffusion.text_to_image(clip, unet, vae, tokenizer, scheduler, opts \\ [])`

### Parameters
- `clip`: The loaded CLIP text encoder model.
- `unet`: The loaded UNet model.
- `vae`: The loaded VAE decoder model.
- `tokenizer`: The loaded tokenizer.
- `scheduler`: The loaded noise scheduler.
- `opts`: Optional arguments, including `num_steps`, `num_images_per_prompt`, `guidance_scale`, `safety_checker`, `safety_checker_featurizer`, `compile`, and `defn_options`.

### Request Example (Basic)
```elixir
repository_id = "CompVis/stable-diffusion-v1-4"

{:ok, tokenizer} = Bumblebee.load_tokenizer({:hf, "openai/clip-vit-large-patch14"})
{:ok, clip}      = Bumblebee.load_model({:hf, repository_id, subdir: "text_encoder"})
{:ok, unet}      = Bumblebee.load_model({:hf, repository_id, subdir: "unet"})
{:ok, vae}       = Bumblebee.load_model({:hf, repository_id, subdir: "vae"}, architecture: :decoder)
{:ok, scheduler} = Bumblebee.load_scheduler({:hf, repository_id, subdir: "scheduler"})
{:ok, featurizer}     = Bumblebee.load_featurizer({:hf, repository_id, subdir: "feature_extractor"})
{:ok, safety_checker} = Bumblebee.load_model({:hf, repository_id, subdir: "safety_checker"})

serving = Bumblebee.Diffusion.StableDiffusion.text_to_image(clip, unet, vae, tokenizer, scheduler, num_steps: 20, num_images_per_prompt: 2, guidance_scale: 7.5, safety_checker: safety_checker, safety_checker_featurizer: featurizer, compile: [batch_size: 1, sequence_length: 60], defn_options: [compiler: EXLA])
Nx.Serving.run(serving, "numbat in forest, detailed, digital art")
```

### Response Example (Basic)
```elixir
%{results: [%{image: #Nx.Tensor<u8[512][512][3] ...>, is_safe: true}, %{image: #Nx.Tensor<u8[512][512][3] ...>, is_safe: true}]}
```

### Request Example (With Negative Prompt and Seed)
```elixir
# Assuming 'serving' is already defined as above
Nx.Serving.run(serving, %{prompt: "a serene mountain lake at sunset", negative_prompt: "ugly, blurry, low quality", seed: 42})
```
```

--------------------------------

### Bumblebee.Text.question_answering/3

Source: https://context7.com/elixir-nx/bumblebee/llms.txt

Extracts the answer span directly from a context passage given a question. Returns ranked answer candidates with positional offsets.

```APIDOC
## Bumblebee.Text.question_answering/3 - Extractive question answering

### Description
Extracts the answer span directly from a context passage given a question. Returns ranked answer candidates with positional offsets.

### Method Signature
`Bumblebee.Text.question_answering(roberta, tokenizer)`

### Parameters
- `roberta`: A loaded question answering model.
- `tokenizer`: A loaded tokenizer.

### Request Example
```elixir
{:ok, roberta} = Bumblebee.load_model({:hf, "deepset/roberta-base-squad2"})
{:ok, tokenizer} = Bumblebee.load_tokenizer({:hf, "FacebookAI/roberta-base"})

serving = Bumblebee.Text.question_answering(roberta, tokenizer)

Nx.Serving.run(serving, %{
  question: "What\'s my name?",
  context: "My name is Sarah and I live in London."
})
```

### Response Example
```elixir
%{results: [%{end: 16, score: 0.81039959192276, start: 11, text: "Sarah"}]}
```
```

--------------------------------

### Extract Image Embeddings with CLIP

Source: https://context7.com/elixir-nx/bumblebee/llms.txt

Extract dense vector representations of images for retrieval or similarity search. Requires CLIP model and featurizer. The `embedding_processor` option can be set to `:l2_norm`.

```elixir
{:ok, clip} = 
  Bumblebee.load_model({:hf, "openai/clip-vit-base-patch32"},
    module: Bumblebee.Vision.ClipVision
  )
{:ok, featurizer} = Bumblebee.load_featurizer({:hf, "openai/clip-vit-base-patch32"})

serving = Bumblebee.Vision.image_embedding(clip, featurizer,
  embedding_processor: :l2_norm
)

image = StbImage.read_file!("/path/to/image.jpg")
Nx.Serving.run(serving, image)
#=> %{
#=>   embedding: #Nx.Tensor<
#=>     f32[768]
#=>     [-0.43403682112693787, 0.09786412119865417, ...]
#=>   >
#=> }
```

--------------------------------

### Fill Mask Task with Bumblebee

Source: https://context7.com/elixir-nx/bumblebee/llms.txt

Use `Bumblebee.Text.fill_mask/3` to predict tokens for a `[MASK]` placeholder. Load the model and tokenizer first. The `top_k` option controls the number of predictions.

```elixir
 {:ok, bert} = Bumblebee.load_model({:hf, "google-bert/bert-base-uncased"})
 {:ok, tokenizer} = Bumblebee.load_tokenizer({:hf, "google-bert/bert-base-uncased"})

 serving = Bumblebee.Text.fill_mask(bert, tokenizer, top_k: 5)

 Nx.Serving.run(serving, "The capital of [MASK] is Paris.")
 #=> %{
 #=>   predictions: [
 #=>     %{score: 0.9279842972755432, token: "france"},
 #=>     %{score: 0.008412551134824753, token: "brittany"},
 #=>     %{score: 0.007433671969920397, token: "algeria"},
 #=>     %{score: 0.004957548808306456, token: "department"},
 #=>     %{score: 0.004369721747934818, token: "reunion"}
 #=>   ]
 #=> }
```