### Quickstart Example: Train and Predict

Source: https://github.com/deeppavlov/autointent/blob/dev/docs/source/quickstart.rst

Demonstrates loading data, initializing a pipeline, training with AutoML, and making predictions.

```python
from autointent import Dataset, Pipeline

# Prepare your data
data = {
    "train": [
        {"utterance": "I want to check my account balance", "label": 0},
        {"utterance": "How do I transfer money?", "label": 1},
        {"utterance": "What's my current balance?", "label": 0},
        {"utterance": "I need to send money to my friend", "label": 1},
        {"utterance": "Can you help me make a payment?", "label": 1},
        {"utterance": "Show me my transaction history", "label": 0},
        {"utterance": "Can you show me my account details?", "label": 0},
        {"utterance": "I want to send funds to someone", "label": 1},
        {"utterance": "What is my available balance?", "label": 0},
        {"utterance": "How can I make a transfer?", "label": 1},
        {"utterance": "Please help me with a payment", "label": 1},
        {"utterance": "I need to view my recent transactions", "label": 0}
    ],
    "validation": [
        {"utterance": "Display my account info", "label": 0},
        {"utterance": "I want to transfer funds", "label": 1}
    ]
}

# Load data into AutoIntent
dataset = Dataset.from_dict(data)

# Initialize and train the AutoML pipeline
pipeline = Pipeline.from_preset("classic-light")
pipeline.fit(dataset)

# Make predictions on new data
predictions = pipeline.predict([
    "What is my available balance?",
    "Transfer money to John"
])
```

--------------------------------

### Install Project Dependencies

Source: https://github.com/deeppavlov/autointent/blob/dev/CONTRIBUTING.md

Installs all project dependencies using the make install command.

```bash
make install
```

--------------------------------

### Install AutoIntent

Source: https://github.com/deeppavlov/autointent/blob/dev/README.md

Install the AutoIntent library using pip.

```bash
pip install autointent
```

--------------------------------

### Install Autointent with OpenAI support

Source: https://github.com/deeppavlov/autointent/blob/dev/docs/source/augmentation_tutorials/intent_description.rst

Install the Autointent library with the necessary dependencies for OpenAI integration.

```bash
pip install "autointent[openai]"
```

--------------------------------

### Install AutoIntent with Weights & Biases Extra

Source: https://github.com/deeppavlov/autointent/blob/dev/docs/source/installation.rst

Installs AutoIntent with the 'wandb' extra for Weights & Biases experiment logging integration.

```bash
pip install "autointent[wandb]"
```

--------------------------------

### Install AutoIntent with CatBoost Extra

Source: https://github.com/deeppavlov/autointent/blob/dev/docs/source/installation.rst

Installs AutoIntent with the 'catboost' extra, enabling CatBoostScorer and CatBoost-based tuning paths.

```bash
pip install "autointent[catboost]"
```

--------------------------------

### Install uv Dependency Manager

Source: https://github.com/deeppavlov/autointent/blob/dev/CONTRIBUTING.md

Installs the uv dependency manager using a curl script. Refer to uv documentation for detailed installation instructions.

```bash
curl -LsSf https://astral.sh/uv/install.sh | sh
```

--------------------------------

### Example .env Configuration

Source: https://github.com/deeppavlov/autointent/blob/dev/docs/source/server.rst

Example environment file settings for AutoIntent servers, including pipeline path and optional host/port configurations for HTTP and MCP over HTTP.

```text
AUTOINTENT_PATH=/path/to/my_autointent_project
# Optional HTTP defaults:
# AUTOINTENT_HOST=0.0.0.0
# AUTOINTENT_PORT=8013
# Optional MCP over HTTP:
# AUTOINTENT_TRANSPORT=http
# AUTOINTENT_PORT=8012
```

--------------------------------

### Install Autointent with DSPy support

Source: https://github.com/deeppavlov/autointent/blob/dev/docs/source/augmentation_tutorials/dspy_augmentation.rst

Install the autointent library with DSPy dependencies. Ensure you have the required packages for DSPy functionality.

```bash
pip install "autointent[dspy]"
```

--------------------------------

### Development Install from Git

Source: https://github.com/deeppavlov/autointent/blob/dev/docs/source/installation.rst

Clones the AutoIntent repository and installs all development dependencies using make. This provides the full contributor set.

```bash
git clone https://github.com/deeppavlov/AutoIntent.git
cd AutoIntent
make install
```

--------------------------------

### Create Optuna Study with Warm Starting

Source: https://github.com/deeppavlov/autointent/blob/dev/docs/source/learn/automl_theory.rst

Initialize an Optuna study, enabling warm starting by loading an existing study if it is found. This allows resuming interrupted optimization processes.

```python
# Optimization state is automatically saved
study = optuna.create_study(
    study_name="intent_classification",
    storage="sqlite:///optuna.db",
    load_if_exists=True
)
```

--------------------------------

### Install AutoIntent with Transformers and PEFT

Source: https://github.com/deeppavlov/autointent/blob/dev/docs/source/installation.rst

Installs AutoIntent with both 'transformers' and 'peft' extras for transformer presets and fine-tuning.

```bash
pip install "autointent[transformers,peft]"
```

--------------------------------

### Install AutoIntent with vLLM Extra

Source: https://github.com/deeppavlov/autointent/blob/dev/docs/source/installation.rst

Installs AutoIntent with the 'vllm' extra, enabling vLLM as an optional high-throughput inference backend where supported.

```bash
pip install "autointent[vllm]"
```

--------------------------------

### Install AutoIntent with OpenSearch Extra

Source: https://github.com/deeppavlov/autointent/blob/dev/docs/source/installation.rst

Installs AutoIntent with the 'opensearch' extra, enabling the OpenSearch client for OpenSearch-backed vector and retrieval integrations.

```bash
pip install "autointent[opensearch]"
```

--------------------------------

### Install AutoIntent with FastMCP Extra

Source: https://github.com/deeppavlov/autointent/blob/dev/docs/source/installation.rst

Installs AutoIntent with the 'fastmcp' extra for FastMCP-based MCP server integration.

```bash
pip install "autointent[fastmcp]"
```

--------------------------------

### Install AutoIntent with CodeCarbon Extra

Source: https://github.com/deeppavlov/autointent/blob/dev/docs/source/installation.rst

Installs AutoIntent with the 'codecarbon' extra for CodeCarbon energy and emissions tracking during runs.

```bash
pip install "autointent[codecarbon]"
```

--------------------------------

### Install AutoIntent with FastAPI Extra

Source: https://github.com/deeppavlov/autointent/blob/dev/docs/source/installation.rst

Installs AutoIntent with the 'fastapi' extra, including the HTTP serving stack (FastAPI, Uvicorn) for the AutoIntent server mode.

```bash
pip install "autointent[fastapi]"
```

--------------------------------

### Install AutoIntent with Sentence Transformers and CatBoost

Source: https://github.com/deeppavlov/autointent/blob/dev/docs/source/installation.rst

Installs AutoIntent with both 'sentence-transformers' and 'catboost' extras for classic embedding and gradient boosting pipelines.

```bash
pip install "autointent[sentence-transformers,catboost]"
```

--------------------------------

### Configure OpenAI Client for Autointent

Source: https://github.com/deeppavlov/autointent/blob/dev/docs/source/augmentation_tutorials/intent_description.rst

Example of configuring an OpenAI-compatible client with a custom base URL and API key for use with the Autointent module.

```python
client = openai.AsyncOpenAI(
    base_url="your-api-base-url",
    api_key="your-api-key"
)
```

--------------------------------

### Run HTTP Server with Uvicorn

Source: https://github.com/deeppavlov/autointent/blob/dev/docs/source/server.rst

Start the AutoIntent HTTP server using Uvicorn, specifying the module path and host/port.

```bash
uvicorn autointent.server.http:app --host 127.0.0.1 --port 8013
```

--------------------------------

### Install AutoIntent with PEFT Extra

Source: https://github.com/deeppavlov/autointent/blob/dev/docs/source/installation.rst

Installs AutoIntent with the 'peft' extra, enabling parameter-efficient fine-tuning methods like LoRA when used with transformer presets.

```bash
pip install "autointent[peft]"
```

--------------------------------

### Setup Generator and Template

Source: https://github.com/deeppavlov/autointent/blob/dev/docs/source/augmentation_tutorials/balancer.rst

Initializes the Generator and EnglishSynthesizerTemplate required by DatasetBalancer. The generator uses an LLM for utterance creation, and the template defines the prompt format.

```python
# Initialize a generator (uses OpenAI API by default)
generator = Generator()

# Create a template for generating utterances
template = EnglishSynthesizerTemplate(dataset=dataset, split="train")
```

--------------------------------

### Install AutoIntent with Sentence Transformers Extra

Source: https://github.com/deeppavlov/autointent/blob/dev/docs/source/installation.rst

Installs AutoIntent along with the 'sentence-transformers' extra, enabling SentenceTransformer embedders and related pipelines.

```bash
pip install "autointent[sentence-transformers]"
```

--------------------------------

### Install AutoIntent with Transformers Extra

Source: https://github.com/deeppavlov/autointent/blob/dev/docs/source/installation.rst

Installs AutoIntent with the 'transformers' extra, enabling Hugging Face transformers models for transformer presets and modules.

```bash
pip install "autointent[transformers]"
```

--------------------------------

### Run MCP Server (Stdio Transport)

Source: https://github.com/deeppavlov/autointent/blob/dev/docs/source/server.rst

Start the AutoIntent MCP server using the default stdio transport via its Python module entrypoint.

```python
python -c "from autointent.server.mcp import main; main()"
```

--------------------------------

### Install VSCode Ruff Extension

Source: https://github.com/deeppavlov/autointent/blob/dev/CONTRIBUTING.md

Provides a link to install the ruff extension for VSCode to help track code style errors directly in the editor.

```text
https://marketplace.visualstudio.com/items?itemName=charliermarsh.ruff
```

--------------------------------

### Build and Train Intent Classifier

Source: https://github.com/deeppavlov/autointent/blob/dev/README.md

Example of building an intent classifier using AutoIntent. Load a dataset, select a preset pipeline, and train it.

```python
from autointent import Pipeline, Dataset

dataset = Dataset.from_json(path_to_json)
pipeline = Pipeline.from_preset("classic-light")
pipeline.fit(dataset)
pipeline.predict(["show me my latest transactions"])
```

--------------------------------

### Minimal Sketch for Adversarial Augmentation

Source: https://github.com/deeppavlov/autointent/blob/dev/docs/source/augmentation_tutorials/adversarial.rst

This Python code demonstrates a minimal setup for adversarial human-like augmentation. It initializes a Dataset, an LLM Generator, a CriticHumanLike, and a HumanUtteranceGenerator to augment training data.

```python
from autointent import Dataset
from autointent.generation import Generator
from autointent.generation.utterances import CriticHumanLike, HumanUtteranceGenerator

dataset = Dataset.from_dict({...})  # your train split, with intent names if you use them in prompts

llm = Generator(model_name="gpt-4o-mini")
critic = CriticHumanLike(generator=llm)
augmenter = HumanUtteranceGenerator(generator=llm, critic=critic, async_mode=False)

new_samples = augmenter.augment(dataset, split_name="train", n_final_per_class=3)
```

--------------------------------

### Create Sample Imbalanced Dataset

Source: https://github.com/deeppavlov/autointent/blob/dev/docs/source/augmentation_tutorials/balancer.rst

Defines a sample dataset with imbalanced class distribution for demonstration purposes. Includes intents and training examples.

```python
from autointent import Dataset
from autointent.generation.utterances.balancer import DatasetBalancer
from autointent.generation.utterances.generator import Generator
from autointent.generation.chat_templates import EnglishSynthesizerTemplate

# Create a simple imbalanced dataset
sample_data = {
    "intents": [
        {"id": 0, "name": "restaurant_booking", "description": "Booking a table at a restaurant"},
        {"id": 1, "name": "weather_query", "description": "Checking weather conditions"},
        {"id": 2, "name": "navigation", "description": "Getting directions to a location"},
    ],
    "train": [
        # Restaurant booking examples (5)
        {"utterance": "Book a table for two tonight", "label": 0},
        {"utterance": "I need a reservation at Le Bistro", "label": 0},
        {"utterance": "Can you reserve a table for me?", "label": 0},
        {"utterance": "I want to book a restaurant for my anniversary", "label": 0},
        {"utterance": "Make a dinner reservation for 8pm", "label": 0},

        # Weather query examples (3)
        {"utterance": "What's the weather like today?", "label": 1},
        {"utterance": "Will it rain tomorrow?", "label": 1},
        {"utterance": "Weather forecast for New York", "label": 1},

        # Navigation example (1)
        {"utterance": "How do I get to the museum?", "label": 2},
    ]
}

# Create the dataset
dataset = Dataset.from_dict(sample_data)
```

--------------------------------

### Examine Generated Examples for a Specific Class

Source: https://github.com/deeppavlov/autointent/blob/dev/docs/source/augmentation_tutorials/balancer.rst

Retrieves and prints original and generated utterances for a specified class ID. Helps in quality control and understanding the augmentation process.

```python
# Navigation class (Class 2)
navigation_class_id = 2
intent = next(i for i in dataset.intents if i.id == navigation_class_id)

print(f"Examples for class {navigation_class_id} ({intent.name}):")

# Original examples
original_examples = [
    s[Dataset.utterance_feature] for s in dataset["train"] if s[Dataset.label_feature] == navigation_class_id
]
print("\nOriginal examples:")
for i, example in enumerate(original_examples, 1):
    print(f"{i}. {example}")

# Generated examples
all_examples = [
    s[Dataset.utterance_feature] for s in balanced_dataset["train"] if s[Dataset.label_feature] == navigation_class_id
]
generated_examples = [ex for ex in all_examples if ex not in original_examples]
print("\nGenerated examples:")
for i, example in enumerate(generated_examples, 1):
    print(f"{i}. {example}")
```

--------------------------------

### Run Specific Project Test

Source: https://github.com/deeppavlov/autointent/blob/dev/CONTRIBUTING.md

Runs a specific test file using pytest, with 'tests/modules/scoring/test_bert.py' as an example.

```bash
uv run pytest tests/modules/scoring/test_bert.py
```

--------------------------------

### AutoML Pipeline Predictions

Source: https://github.com/deeppavlov/autointent/blob/dev/docs/source/quickstart.rst

Perform batch predictions using a trained AutoIntent pipeline. Provide a list of user utterances to get intent classification results.

```python
# Batch predictions
results = pipeline.predict([
    "What's my account balance?",
    "Transfer $100 to John",
    "Show me recent transactions"
])
```

--------------------------------

### Build and Serve Documentation Locally

Source: https://github.com/deeppavlov/autointent/blob/dev/CONTRIBUTING.md

Builds the HTML documentation and hosts it locally for preview.

```bash
make serve-docs
```

--------------------------------

### Build HTML Documentation

Source: https://github.com/deeppavlov/autointent/blob/dev/CONTRIBUTING.md

Builds the HTML version of the documentation and places it in the 'docs/build' folder.

```bash
make docs
```

--------------------------------

### Dry-run Multi-Version Documentation Build

Source: https://github.com/deeppavlov/autointent/blob/dev/CONTRIBUTING.md

Run this command locally to test the multi-version documentation build process before a release. Ensure you have the full git history and tags available.

```bash
make multi-version-docs
```

--------------------------------

### Run HTTP Server via Module Entrypoint

Source: https://github.com/deeppavlov/autointent/blob/dev/docs/source/server.rst

Execute the HTTP server using its Python module entrypoint, which respects AUTOINTENT_HOST and AUTOINTENT_PORT settings.

```python
python -c "from autointent.server.http import main; main()"
```

--------------------------------

### Load Pipeline Presets for Different Budgets

Source: https://github.com/deeppavlov/autointent/blob/dev/docs/source/learn/automl_theory.rst

Loads predefined pipeline configurations ('classic-light', 'classic-heavy', 'zero-shot-encoders') optimized for speed, performance, or zero-shot capabilities.

```python
# Different computational budgets
pipeline_light = Pipeline.from_preset("classic-light")    # Speed-focused
pipeline_heavy = Pipeline.from_preset("classic-heavy")    # Performance-focused

# Different model types  
pipeline_zero_shot = Pipeline.from_preset("zero-shot-encoders")  # No training data
```

--------------------------------

### Check Type Hints

Source: https://github.com/deeppavlov/autointent/blob/dev/CONTRIBUTING.md

Verifies type hints in the project using the make typing command.

```bash
make typing
```

--------------------------------

### Generate Intent Descriptions with Autointent

Source: https://github.com/deeppavlov/autointent/blob/dev/docs/source/augmentation_tutorials/intent_description.rst

Demonstrates how to use the `generate_descriptions` function to enhance a dataset with LLM-generated intent descriptions. Requires an OpenAI client and a custom prompt template.

```python
import openai
from autointent import Dataset
from autointent.generation.intents import generate_descriptions
from autointent.generation.chat_templates import PromptDescription

client = openai.AsyncOpenAI(
    api_key="your-api-key"
)

dataset = Dataset.from_hub("AutoIntent/clinc150_subset")

prompt = PromptDescription(
    text="Describe intent {intent_name} with examples: {user_utterances} and patterns: {regex_patterns}",
)

enhanced_dataset = generate_descriptions(
    dataset=dataset,
    client=client,
    prompt=prompt,
    model_name="gpt4o-mini",
)

enhanced_dataset.to_csv("enhanced_clinc150.csv")
```

--------------------------------

### Loading Data with AutoIntent

Source: https://github.com/deeppavlov/autointent/blob/dev/docs/source/quickstart.rst

Shows how to load data into an AutoIntent Dataset from a dictionary, JSON file, or Hugging Face Hub.

```python
from autointent import Dataset

# From dictionary
dataset = Dataset.from_dict(data)

# From JSON file
dataset = Dataset.from_json("/path/to/your/data.json")

# From Hugging Face Hub
dataset = Dataset.from_hub("your-username/your-dataset")
```

--------------------------------

### AutoML Pipeline Presets

Source: https://github.com/deeppavlov/autointent/blob/dev/docs/source/quickstart.rst

Initialize an AutoIntent pipeline using different preset configurations for various scenarios, from fast training to experimental transformer models. The pipeline is then trained on a dataset.

```python
from autointent import Pipeline

# Our quick and accurate SoTA
pipeline = Pipeline.from_preset("classic-light")

# If you have more training time
pipeline = Pipeline.from_preset("classic-heavy")

# Experimental preset with fine-tuning methods
pipeline = Pipeline.from_preset("transformers-light")

# Train the pipeline
pipeline.fit(dataset)
```

--------------------------------

### Initialize DatasetBalancer

Source: https://github.com/deeppavlov/autointent/blob/dev/docs/source/augmentation_tutorials/balancer.rst

Creates an instance of DatasetBalancer with the specified generator, prompt maker, and balancing parameters. `max_samples_per_class` determines the target number of samples for each class.

```python
balancer = DatasetBalancer(
    generator=generator,
    prompt_maker=template,
    async_mode=False,  # Set to True for faster generation with async processing
    max_samples_per_class=5,  # Each class will have exactly 5 samples after balancing
)
```

--------------------------------

### Synchronize Documentation Dependencies

Source: https://github.com/deeppavlov/autointent/blob/dev/CONTRIBUTING.md

Synchronizes dependencies for documentation builds, including extra groups like 'catboost', 'peft', 'transformers', 'sentence-transformers', and 'openai'. Pandoc is also required.

```bash
uv sync --group docs --extra catboost --extra peft --extra transformers --extra sentence-transformers --extra openai
```

--------------------------------

### Configure Dataset Balancing to Exact Sample Count

Source: https://github.com/deeppavlov/autointent/blob/dev/docs/source/augmentation_tutorials/balancer.rst

Sets up a DatasetBalancer to ensure each class has exactly 10 samples. Requires a generator and a prompt maker.

```python
# To bring all classes to exactly 10 samples
original_dataset = Dataset.from_dict(sample_data)
exact_template = EnglishSynthesizerTemplate(dataset=original_dataset, split="train")

exact_balancer = DatasetBalancer(
    generator=generator,
    prompt_maker=exact_template,
    max_samples_per_class=10
)
```

--------------------------------

### Run Documentation Tests

Source: https://github.com/deeppavlov/autointent/blob/dev/CONTRIBUTING.md

Executes doctests, similar to CI checks on PRs and pushes to the 'dev' branch.

```bash
make test-docs
```

--------------------------------

### Configure Hyperparameter Optimization

Source: https://github.com/deeppavlov/autointent/blob/dev/docs/source/learn/automl_theory.rst

Set up the HPOConfig for AutoIntent, specifying the sampler, number of trials, startup trials, timeout, and parallel jobs.

```python
hpo_config = HPOConfig(
    sampler="tpe",
    n_trials=50,              # Total optimization budget
    n_startup_trials=10,      # Random initialization
    timeout=3600,             # 1-hour time limit
    n_jobs=4                  # Parallel trials
)
```

--------------------------------

### Configure Dataset Balancing to Max Sample Count

Source: https://github.com/deeppavlov/autointent/blob/dev/docs/source/augmentation_tutorials/balancer.rst

Sets up a DatasetBalancer to balance classes to the level of the most represented class. `max_samples_per_class=None` achieves this.

```python
# Balance to the level of the most represented class
max_template = EnglishSynthesizerTemplate(dataset=original_dataset, split="train")

max_balancer = DatasetBalancer(
    generator=generator,
    prompt_maker=max_template,
    max_samples_per_class=None  # Will use the count of the most represented class
)
```

--------------------------------

### Clean and Rebuild Documentation

Source: https://github.com/deeppavlov/autointent/blob/dev/CONTRIBUTING.md

Cleans the documentation build artifacts and then rebuilds the HTML documentation. Useful if the build is stale or links appear incorrect.

```bash
make clean-docs
make docs
```

--------------------------------

### Configure Search Space with KNN and Linear Modules

Source: https://github.com/deeppavlov/autointent/blob/dev/docs/source/learn/automl_theory.rst

Defines a search space for hyperparameter tuning, specifying modules like 'knn' with parameter ranges for 'k' and 'weights', and 'linear' with 'cv' options.

```yaml
search_space:
  - node_type: scoring
    target_metric: scoring_f1
    search_space:
      - module_name: knn
        k:
          low: 1
          high: 20
        weights: [uniform, distance, closest]
      - module_name: linear
        cv: [3, 5, 10]
```

--------------------------------

### Build an Intent Classifier with AutoIntent

Source: https://github.com/deeppavlov/autointent/blob/dev/docs/source/index.rst

Use this snippet to quickly build an intent classifier. It requires loading a dataset from a JSON file and fitting a pre-configured pipeline.

```python
from autointent import Pipeline, Dataset

dataset = Dataset.from_json(path_to_json)
pipeline = Pipeline.from_preset("classic-light")
pipeline.fit(dataset)
pipeline.predict(["show me my latest recent transactions"])
```

--------------------------------

### Lint and Format Code

Source: https://github.com/deeppavlov/autointent/blob/dev/CONTRIBUTING.md

Checks code style and applies formatting using the make lint command.

```bash
make lint
```

--------------------------------

### Check Initial Class Distribution

Source: https://github.com/deeppavlov/autointent/blob/dev/docs/source/augmentation_tutorials/balancer.rst

Calculates and prints the class distribution of the initial training dataset. This helps visualize the imbalance before applying the balancing process.

```python
# Check the initial distribution of classes in the training set
initial_distribution = {}
for sample in dataset["train"]:
    label = sample[Dataset.label_feature]
    initial_distribution[label] = initial_distribution.get(label, 0) + 1

print("Initial class distribution:")
for class_id, count in sorted(initial_distribution.items()):
    intent = next(i for i in dataset.intents if i.id == class_id)
    print(f"Class {class_id} ({intent.name}): {count} samples")

print(f"\nMost represented class: {max(initial_distribution.values())} samples")
print(f"Least represented class: {min(initial_distribution.values())} samples")
```

--------------------------------

### Augment Dataset using DSPYIncrementalUtteranceEvolver

Source: https://github.com/deeppavlov/autointent/blob/dev/docs/source/augmentation_tutorials/dspy_augmentation.rst

Augment a dataset using the DSPYIncrementalUtteranceEvolver. Configure API keys, model, and augmentation parameters. Refer to LiteLLM documentation for model configuration.

```python
import os
os.environ["OPENAI_API_KEY"] = "your-api-key"

from autointent import Dataset
from autointent.custom_types import Split

dataset = Dataset.from_hub("AutoIntent/clinc150_subset")
evolver = DSPYIncrementalUtteranceEvolver(
    "openai/gpt-4o-mini"
)

augmented_dataset = evolver.augment(
    dataset,
    split_name=Split.TEST,
    n_evolutions=1,
    mipro_init_params={
        "auto": "light",
    },
    mipro_compile_params={
        "minibatch": False,
    },
)

augmented_dataset.to_csv("clinc150_dspy_augment.csv")
```

--------------------------------

### MCP Server Tools

Source: https://github.com/deeppavlov/autointent/blob/dev/docs/source/server.rst

The FastMCP server provides tools for prediction, class listing, and retrieving training data, accessible via stdio or HTTP transport.

```APIDOC
## MCP Tools

### predict

#### Description
Performs intent prediction on a list of utterances.

#### Arguments
- **utterances** (list[str]) - A list of text utterances to predict intents for.

#### Returns
- **predictions** (list) - A list of predictions, similar to the HTTP API.

### classes

#### Description
Retrieves a list of available intents (classes).

#### Arguments
- **page** (int) - Optional - The page number for pagination.
- **page_size** (int) - Optional - The number of items per page.

#### Returns
- **classes** (list[Intent]) - A list of Intent objects, each containing id, name, tags, regex fields, and description.
- **pagination_info** (object) - Information about the pagination.

### train_data

#### Description
Retrieves training data samples.

#### Arguments
- **page** (int) - Optional - The page number for pagination.
- **page_size** (int) - Optional - The number of items per page.
- **class_filter** (list[int]) - Optional - A list of class IDs to filter the training data by.

#### Returns
- **samples** (list[Sample]) - A list of Sample objects, each containing id, text, and label.
- **pagination_info** (object) - Information about the pagination.
```

--------------------------------

### Run All Project Tests

Source: https://github.com/deeppavlov/autointent/blob/dev/CONTRIBUTING.md

Executes all automated tests for the project to ensure changes do not break existing features.

```bash
make test
```

--------------------------------

### Check Data Split Readiness in AutoIntent

Source: https://github.com/deeppavlov/autointent/blob/dev/docs/source/concepts.rst

Use this function to validate if your data is suitable for splitting before fitting. It helps ensure proper handling of OOS samples.

```python
autointent.context.data_handler.check_split_readiness
```

--------------------------------

### Configure Log-Uniform Learning Rate

Source: https://github.com/deeppavlov/autointent/blob/dev/docs/source/learn/automl_theory.rst

Sets bounds for a learning rate parameter, enabling log-uniform sampling for better exploration of learning rate values.

```yaml
learning_rate:
  low: 1.0e-5    # Prevent too slow learning
  high: 1.0e-2   # Prevent instability
  log: true      # Log-uniform sampling
```

--------------------------------

### Configure Data Splitting with Cross-Validation

Source: https://github.com/deeppavlov/autointent/blob/dev/docs/source/learn/automl_theory.rst

Sets up data configuration for cross-validation, specifying the scheme, number of folds, validation size, and a separation ratio to prevent data leakage.

```python
from autointent.configs import DataConfig

data_config = DataConfig(
    scheme="cv",           # Cross-validation
    n_folds=5,             # 5-fold CV
    validation_size=0.2,   # 20% for validation in HO
    separation_ratio=0.5   # Prevent data leakage between modules
)
```

--------------------------------

### Data Format: Multi-Label Classification

Source: https://github.com/deeppavlov/autointent/blob/dev/docs/source/quickstart.rst

Illustrates the dictionary structure for multi-label classification, using lists of 0s and 1s for labels.

```python
data = {
    "train": [
        {"utterance": "Book urgent flight to Paris", "label": [1, 0, 1]},
        {"utterance": "What's the weather?", "label": [0, 1, 0]}
    ]
}
```

--------------------------------

### Zero-Shot Intent Classification with BiEncoderDescriptionScorer

Source: https://github.com/deeppavlov/autointent/blob/dev/docs/source/learn/text_embeddings.rst

Perform zero-shot intent classification by providing intent descriptions instead of training data. Requires fitting the scorer with descriptions and then predicting on new utterances.

```python
from autointent.modules.scoring import BiEncoderDescriptionScorer

scorer = BiEncoderDescriptionScorer()

# Intent descriptions instead of training data
descriptions = [
    "User wants to book a flight",
    "User wants to cancel a reservation",
    "User asks about flight status"
]

scorer.fit([], [], descriptions)
predictions = scorer.predict(["I want to fly to London"])
```

--------------------------------

### Task-Specific Prompting for Embeddings

Source: https://github.com/deeppavlov/autointent/blob/dev/docs/source/learn/text_embeddings.rst

Use task-specific prompts to optimize embedding generation for different use cases like search queries, document passages, or intent classification.

```python
query_embeddings = embedder.embed(queries, TaskTypeEnum.query)
    
doc_embeddings = embedder.embed(documents, TaskTypeEnum.passage)
    
intent_embeddings = embedder.embed(utterances, TaskTypeEnum.classification)
```

--------------------------------

### Check Balanced Class Distribution

Source: https://github.com/deeppavlov/autointent/blob/dev/docs/source/augmentation_tutorials/balancer.rst

Calculates and prints the class distribution of the balanced training dataset. This verifies the effectiveness of the DatasetBalancer in achieving the desired class balance.

```python
# Check the balanced distribution
balanced_distribution = {}
for sample in balanced_dataset["train"]:
    # The rest of the code to calculate and print balanced_distribution is missing in the source.
```

--------------------------------

### Analyze Class Distribution

Source: https://github.com/deeppavlov/autointent/blob/dev/docs/source/augmentation_tutorials/balancer.rst

Prints the distribution of samples across different classes and identifies the most and least represented classes. Useful for understanding dataset imbalance.

```python
label = sample[Dataset.label_feature]
balanced_distribution[label] = balanced_distribution.get(label, 0) + 1

print("Balanced class distribution:")
for class_id, count in sorted(balanced_distribution.items()):
    intent = next(i for i in dataset.intents if i.id == class_id)
    print(f"Class {class_id} ({intent.name}): {count} samples")

print(f"\nMost represented class: {max(balanced_distribution.values())} samples")
print(f"Least represented class: {min(balanced_distribution.values())} samples")
```

--------------------------------

### Regenerate Optimizer JSON Schema

Source: https://github.com/deeppavlov/autointent/blob/dev/CONTRIBUTING.md

Regenerates the JSON schema for OptimizerConfig and related Pydantic models if they have changed.

```bash
make schema
```

--------------------------------

### Direct KNNScorer Usage

Source: https://github.com/deeppavlov/autointent/blob/dev/docs/source/quickstart.rst

Initialize and use the KNNScorer module directly for intent classification. This involves fitting the scorer with training utterances and labels, then making predictions on new inputs.

```python
from autointent.modules import KNNScorer

# Initialize a specific scorer
scorer = KNNScorer(
    embedder_config="sentence-transformers/all-MiniLM-L6-v2",
    k=3
)

# Train on your data
train_utterances = [
    "Check my account balance",
    "Transfer money to account",
    "Show transaction history"
]
train_labels = [0, 1, 0]

scorer.fit(train_utterances, train_labels)

# Make predictions
predictions = scorer.predict([
    "What's my current balance?",
    "Send money to my friend"
])
```

--------------------------------

### Balance the Dataset

Source: https://github.com/deeppavlov/autointent/blob/dev/docs/source/augmentation_tutorials/balancer.rst

Applies the DatasetBalancer to augment the dataset and balance the class distribution in the training split. `batch_size` controls the number of generations processed concurrently.

```python
# Create a copy of the dataset
dataset_copy = Dataset.from_dict(dataset.to_dict())

# Balance the training split
balanced_dataset = balancer.balance(
    dataset=dataset_copy,
    split="train",
    batch_size=2,  # Process generations in batches of 2
)
```

--------------------------------

### HTTP Server Endpoints

Source: https://github.com/deeppavlov/autointent/blob/dev/docs/source/server.rst

The FastAPI-based HTTP server exposes endpoints for health checks and predictions. It expects JSON payloads and returns JSON responses.

```APIDOC
## GET /health

### Description
Checks the health status of the inference server.

### Method
GET

### Endpoint
/health

### Response
#### Success Response (200)
- **status** (string) - Indicates the server is healthy.

#### Response Example
{
  "status": "healthy"
}

## POST /predict

### Description
Performs intent prediction on a list of utterances.

### Method
POST

### Endpoint
/predict

### Parameters
#### Request Body
- **utterances** (list[string]) - Required - A list of text utterances to predict intents for.

### Request Example
{
  "utterances": ["text one", "text two"]
}

### Response
#### Success Response (200)
- **predictions** (list) - A list of predictions, one for each input utterance. The format depends on whether the pipeline is single-label or multi-label.

#### Response Example
{
  "predictions": [0, [1, 2]]
}
```

--------------------------------

### Data Format: Single-Label Classification

Source: https://github.com/deeppavlov/autointent/blob/dev/docs/source/quickstart.rst

Defines the dictionary structure for single-label classification data with train, validation, and test splits.

```python
data = {
    "train": [
        {"utterance": "Hello, how are you?", "label": 0},
        {"utterance": "Book a flight to Paris", "label": 1},
        {"utterance": "What's the weather like?", "label": 2}
    ],
    "validation": [
        {"utterance": "Hi there!", "label": 0}
    ],
    "test": [
        {"utterance": "Good morning", "label": 0}
    ]
}
```

--------------------------------

### AutoIntent EMNLP 2025 Paper Citation

Source: https://github.com/deeppavlov/autointent/blob/dev/README.md

Citation details for the AutoIntent EMNLP 2025 paper.

```bibtex
@misc{alekseev2025autointentautomltextclassification,
      title={AutoIntent: AutoML for Text Classification},
      author={Ilya Alekseev and Roman Solomatin and Darina Rustamova and Denis Kuznetsov},
      year={2025},
      eprint={2509.21138},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2509.21138},
}
```

=== COMPLETE CONTENT === This response contains all available snippets from this library. No additional content exists. Do not make further requests.