### Quickstart Example: Train and Predict Source: https://github.com/deeppavlov/autointent/blob/dev/docs/source/quickstart.rst Demonstrates loading data, initializing a pipeline, training with AutoML, and making predictions. ```python from autointent import Dataset, Pipeline # Prepare your data data = { "train": [ {"utterance": "I want to check my account balance", "label": 0}, {"utterance": "How do I transfer money?", "label": 1}, {"utterance": "What's my current balance?", "label": 0}, {"utterance": "I need to send money to my friend", "label": 1}, {"utterance": "Can you help me make a payment?", "label": 1}, {"utterance": "Show me my transaction history", "label": 0}, {"utterance": "Can you show me my account details?", "label": 0}, {"utterance": "I want to send funds to someone", "label": 1}, {"utterance": "What is my available balance?", "label": 0}, {"utterance": "How can I make a transfer?", "label": 1}, {"utterance": "Please help me with a payment", "label": 1}, {"utterance": "I need to view my recent transactions", "label": 0} ], "validation": [ {"utterance": "Display my account info", "label": 0}, {"utterance": "I want to transfer funds", "label": 1} ] } # Load data into AutoIntent dataset = Dataset.from_dict(data) # Initialize and train the AutoML pipeline pipeline = Pipeline.from_preset("classic-light") pipeline.fit(dataset) # Make predictions on new data predictions = pipeline.predict([ "What is my available balance?", "Transfer money to John" ]) ``` -------------------------------- ### Install Project Dependencies Source: https://github.com/deeppavlov/autointent/blob/dev/CONTRIBUTING.md Installs all project dependencies using the make install command. ```bash make install ``` -------------------------------- ### Install AutoIntent Source: https://github.com/deeppavlov/autointent/blob/dev/README.md Install the AutoIntent library using pip. ```bash pip install autointent ``` -------------------------------- ### Install Autointent with OpenAI support Source: https://github.com/deeppavlov/autointent/blob/dev/docs/source/augmentation_tutorials/intent_description.rst Install the Autointent library with the necessary dependencies for OpenAI integration. ```bash pip install "autointent[openai]" ``` -------------------------------- ### Install AutoIntent with Weights & Biases Extra Source: https://github.com/deeppavlov/autointent/blob/dev/docs/source/installation.rst Installs AutoIntent with the 'wandb' extra for Weights & Biases experiment logging integration. ```bash pip install "autointent[wandb]" ``` -------------------------------- ### Install AutoIntent with CatBoost Extra Source: https://github.com/deeppavlov/autointent/blob/dev/docs/source/installation.rst Installs AutoIntent with the 'catboost' extra, enabling CatBoostScorer and CatBoost-based tuning paths. ```bash pip install "autointent[catboost]" ``` -------------------------------- ### Install uv Dependency Manager Source: https://github.com/deeppavlov/autointent/blob/dev/CONTRIBUTING.md Installs the uv dependency manager using a curl script. Refer to uv documentation for detailed installation instructions. ```bash curl -LsSf https://astral.sh/uv/install.sh | sh ``` -------------------------------- ### Example .env Configuration Source: https://github.com/deeppavlov/autointent/blob/dev/docs/source/server.rst Example environment file settings for AutoIntent servers, including pipeline path and optional host/port configurations for HTTP and MCP over HTTP. ```text AUTOINTENT_PATH=/path/to/my_autointent_project # Optional HTTP defaults: # AUTOINTENT_HOST=0.0.0.0 # AUTOINTENT_PORT=8013 # Optional MCP over HTTP: # AUTOINTENT_TRANSPORT=http # AUTOINTENT_PORT=8012 ``` -------------------------------- ### Install Autointent with DSPy support Source: https://github.com/deeppavlov/autointent/blob/dev/docs/source/augmentation_tutorials/dspy_augmentation.rst Install the autointent library with DSPy dependencies. Ensure you have the required packages for DSPy functionality. ```bash pip install "autointent[dspy]" ``` -------------------------------- ### Development Install from Git Source: https://github.com/deeppavlov/autointent/blob/dev/docs/source/installation.rst Clones the AutoIntent repository and installs all development dependencies using make. This provides the full contributor set. ```bash git clone https://github.com/deeppavlov/AutoIntent.git cd AutoIntent make install ``` -------------------------------- ### Create Optuna Study with Warm Starting Source: https://github.com/deeppavlov/autointent/blob/dev/docs/source/learn/automl_theory.rst Initialize an Optuna study, enabling warm starting by loading an existing study if it is found. This allows resuming interrupted optimization processes. ```python # Optimization state is automatically saved study = optuna.create_study( study_name="intent_classification", storage="sqlite:///optuna.db", load_if_exists=True ) ``` -------------------------------- ### Install AutoIntent with Transformers and PEFT Source: https://github.com/deeppavlov/autointent/blob/dev/docs/source/installation.rst Installs AutoIntent with both 'transformers' and 'peft' extras for transformer presets and fine-tuning. ```bash pip install "autointent[transformers,peft]" ``` -------------------------------- ### Install AutoIntent with vLLM Extra Source: https://github.com/deeppavlov/autointent/blob/dev/docs/source/installation.rst Installs AutoIntent with the 'vllm' extra, enabling vLLM as an optional high-throughput inference backend where supported. ```bash pip install "autointent[vllm]" ``` -------------------------------- ### Install AutoIntent with OpenSearch Extra Source: https://github.com/deeppavlov/autointent/blob/dev/docs/source/installation.rst Installs AutoIntent with the 'opensearch' extra, enabling the OpenSearch client for OpenSearch-backed vector and retrieval integrations. ```bash pip install "autointent[opensearch]" ``` -------------------------------- ### Install AutoIntent with FastMCP Extra Source: https://github.com/deeppavlov/autointent/blob/dev/docs/source/installation.rst Installs AutoIntent with the 'fastmcp' extra for FastMCP-based MCP server integration. ```bash pip install "autointent[fastmcp]" ``` -------------------------------- ### Install AutoIntent with CodeCarbon Extra Source: https://github.com/deeppavlov/autointent/blob/dev/docs/source/installation.rst Installs AutoIntent with the 'codecarbon' extra for CodeCarbon energy and emissions tracking during runs. ```bash pip install "autointent[codecarbon]" ``` -------------------------------- ### Install AutoIntent with FastAPI Extra Source: https://github.com/deeppavlov/autointent/blob/dev/docs/source/installation.rst Installs AutoIntent with the 'fastapi' extra, including the HTTP serving stack (FastAPI, Uvicorn) for the AutoIntent server mode. ```bash pip install "autointent[fastapi]" ``` -------------------------------- ### Install AutoIntent with Sentence Transformers and CatBoost Source: https://github.com/deeppavlov/autointent/blob/dev/docs/source/installation.rst Installs AutoIntent with both 'sentence-transformers' and 'catboost' extras for classic embedding and gradient boosting pipelines. ```bash pip install "autointent[sentence-transformers,catboost]" ``` -------------------------------- ### Configure OpenAI Client for Autointent Source: https://github.com/deeppavlov/autointent/blob/dev/docs/source/augmentation_tutorials/intent_description.rst Example of configuring an OpenAI-compatible client with a custom base URL and API key for use with the Autointent module. ```python client = openai.AsyncOpenAI( base_url="your-api-base-url", api_key="your-api-key" ) ``` -------------------------------- ### Run HTTP Server with Uvicorn Source: https://github.com/deeppavlov/autointent/blob/dev/docs/source/server.rst Start the AutoIntent HTTP server using Uvicorn, specifying the module path and host/port. ```bash uvicorn autointent.server.http:app --host 127.0.0.1 --port 8013 ``` -------------------------------- ### Install AutoIntent with PEFT Extra Source: https://github.com/deeppavlov/autointent/blob/dev/docs/source/installation.rst Installs AutoIntent with the 'peft' extra, enabling parameter-efficient fine-tuning methods like LoRA when used with transformer presets. ```bash pip install "autointent[peft]" ``` -------------------------------- ### Setup Generator and Template Source: https://github.com/deeppavlov/autointent/blob/dev/docs/source/augmentation_tutorials/balancer.rst Initializes the Generator and EnglishSynthesizerTemplate required by DatasetBalancer. The generator uses an LLM for utterance creation, and the template defines the prompt format. ```python # Initialize a generator (uses OpenAI API by default) generator = Generator() # Create a template for generating utterances template = EnglishSynthesizerTemplate(dataset=dataset, split="train") ``` -------------------------------- ### Install AutoIntent with Sentence Transformers Extra Source: https://github.com/deeppavlov/autointent/blob/dev/docs/source/installation.rst Installs AutoIntent along with the 'sentence-transformers' extra, enabling SentenceTransformer embedders and related pipelines. ```bash pip install "autointent[sentence-transformers]" ``` -------------------------------- ### Install AutoIntent with Transformers Extra Source: https://github.com/deeppavlov/autointent/blob/dev/docs/source/installation.rst Installs AutoIntent with the 'transformers' extra, enabling Hugging Face transformers models for transformer presets and modules. ```bash pip install "autointent[transformers]" ``` -------------------------------- ### Run MCP Server (Stdio Transport) Source: https://github.com/deeppavlov/autointent/blob/dev/docs/source/server.rst Start the AutoIntent MCP server using the default stdio transport via its Python module entrypoint. ```python python -c "from autointent.server.mcp import main; main()" ``` -------------------------------- ### Install VSCode Ruff Extension Source: https://github.com/deeppavlov/autointent/blob/dev/CONTRIBUTING.md Provides a link to install the ruff extension for VSCode to help track code style errors directly in the editor. ```text https://marketplace.visualstudio.com/items?itemName=charliermarsh.ruff ``` -------------------------------- ### Build and Train Intent Classifier Source: https://github.com/deeppavlov/autointent/blob/dev/README.md Example of building an intent classifier using AutoIntent. Load a dataset, select a preset pipeline, and train it. ```python from autointent import Pipeline, Dataset dataset = Dataset.from_json(path_to_json) pipeline = Pipeline.from_preset("classic-light") pipeline.fit(dataset) pipeline.predict(["show me my latest transactions"]) ``` -------------------------------- ### Minimal Sketch for Adversarial Augmentation Source: https://github.com/deeppavlov/autointent/blob/dev/docs/source/augmentation_tutorials/adversarial.rst This Python code demonstrates a minimal setup for adversarial human-like augmentation. It initializes a Dataset, an LLM Generator, a CriticHumanLike, and a HumanUtteranceGenerator to augment training data. ```python from autointent import Dataset from autointent.generation import Generator from autointent.generation.utterances import CriticHumanLike, HumanUtteranceGenerator dataset = Dataset.from_dict({...}) # your train split, with intent names if you use them in prompts llm = Generator(model_name="gpt-4o-mini") critic = CriticHumanLike(generator=llm) augmenter = HumanUtteranceGenerator(generator=llm, critic=critic, async_mode=False) new_samples = augmenter.augment(dataset, split_name="train", n_final_per_class=3) ``` -------------------------------- ### Create Sample Imbalanced Dataset Source: https://github.com/deeppavlov/autointent/blob/dev/docs/source/augmentation_tutorials/balancer.rst Defines a sample dataset with imbalanced class distribution for demonstration purposes. Includes intents and training examples. ```python from autointent import Dataset from autointent.generation.utterances.balancer import DatasetBalancer from autointent.generation.utterances.generator import Generator from autointent.generation.chat_templates import EnglishSynthesizerTemplate # Create a simple imbalanced dataset sample_data = { "intents": [ {"id": 0, "name": "restaurant_booking", "description": "Booking a table at a restaurant"}, {"id": 1, "name": "weather_query", "description": "Checking weather conditions"}, {"id": 2, "name": "navigation", "description": "Getting directions to a location"}, ], "train": [ # Restaurant booking examples (5) {"utterance": "Book a table for two tonight", "label": 0}, {"utterance": "I need a reservation at Le Bistro", "label": 0}, {"utterance": "Can you reserve a table for me?", "label": 0}, {"utterance": "I want to book a restaurant for my anniversary", "label": 0}, {"utterance": "Make a dinner reservation for 8pm", "label": 0}, # Weather query examples (3) {"utterance": "What's the weather like today?", "label": 1}, {"utterance": "Will it rain tomorrow?", "label": 1}, {"utterance": "Weather forecast for New York", "label": 1}, # Navigation example (1) {"utterance": "How do I get to the museum?", "label": 2}, ] } # Create the dataset dataset = Dataset.from_dict(sample_data) ``` -------------------------------- ### Examine Generated Examples for a Specific Class Source: https://github.com/deeppavlov/autointent/blob/dev/docs/source/augmentation_tutorials/balancer.rst Retrieves and prints original and generated utterances for a specified class ID. Helps in quality control and understanding the augmentation process. ```python # Navigation class (Class 2) navigation_class_id = 2 intent = next(i for i in dataset.intents if i.id == navigation_class_id) print(f"Examples for class {navigation_class_id} ({intent.name}):") # Original examples original_examples = [ s[Dataset.utterance_feature] for s in dataset["train"] if s[Dataset.label_feature] == navigation_class_id ] print("\nOriginal examples:") for i, example in enumerate(original_examples, 1): print(f"{i}. {example}") # Generated examples all_examples = [ s[Dataset.utterance_feature] for s in balanced_dataset["train"] if s[Dataset.label_feature] == navigation_class_id ] generated_examples = [ex for ex in all_examples if ex not in original_examples] print("\nGenerated examples:") for i, example in enumerate(generated_examples, 1): print(f"{i}. {example}") ``` -------------------------------- ### Run Specific Project Test Source: https://github.com/deeppavlov/autointent/blob/dev/CONTRIBUTING.md Runs a specific test file using pytest, with 'tests/modules/scoring/test_bert.py' as an example. ```bash uv run pytest tests/modules/scoring/test_bert.py ``` -------------------------------- ### AutoML Pipeline Predictions Source: https://github.com/deeppavlov/autointent/blob/dev/docs/source/quickstart.rst Perform batch predictions using a trained AutoIntent pipeline. Provide a list of user utterances to get intent classification results. ```python # Batch predictions results = pipeline.predict([ "What's my account balance?", "Transfer $100 to John", "Show me recent transactions" ]) ``` -------------------------------- ### Build and Serve Documentation Locally Source: https://github.com/deeppavlov/autointent/blob/dev/CONTRIBUTING.md Builds the HTML documentation and hosts it locally for preview. ```bash make serve-docs ``` -------------------------------- ### Build HTML Documentation Source: https://github.com/deeppavlov/autointent/blob/dev/CONTRIBUTING.md Builds the HTML version of the documentation and places it in the 'docs/build' folder. ```bash make docs ``` -------------------------------- ### Dry-run Multi-Version Documentation Build Source: https://github.com/deeppavlov/autointent/blob/dev/CONTRIBUTING.md Run this command locally to test the multi-version documentation build process before a release. Ensure you have the full git history and tags available. ```bash make multi-version-docs ``` -------------------------------- ### Run HTTP Server via Module Entrypoint Source: https://github.com/deeppavlov/autointent/blob/dev/docs/source/server.rst Execute the HTTP server using its Python module entrypoint, which respects AUTOINTENT_HOST and AUTOINTENT_PORT settings. ```python python -c "from autointent.server.http import main; main()" ``` -------------------------------- ### Load Pipeline Presets for Different Budgets Source: https://github.com/deeppavlov/autointent/blob/dev/docs/source/learn/automl_theory.rst Loads predefined pipeline configurations ('classic-light', 'classic-heavy', 'zero-shot-encoders') optimized for speed, performance, or zero-shot capabilities. ```python # Different computational budgets pipeline_light = Pipeline.from_preset("classic-light") # Speed-focused pipeline_heavy = Pipeline.from_preset("classic-heavy") # Performance-focused # Different model types pipeline_zero_shot = Pipeline.from_preset("zero-shot-encoders") # No training data ``` -------------------------------- ### Check Type Hints Source: https://github.com/deeppavlov/autointent/blob/dev/CONTRIBUTING.md Verifies type hints in the project using the make typing command. ```bash make typing ``` -------------------------------- ### Generate Intent Descriptions with Autointent Source: https://github.com/deeppavlov/autointent/blob/dev/docs/source/augmentation_tutorials/intent_description.rst Demonstrates how to use the `generate_descriptions` function to enhance a dataset with LLM-generated intent descriptions. Requires an OpenAI client and a custom prompt template. ```python import openai from autointent import Dataset from autointent.generation.intents import generate_descriptions from autointent.generation.chat_templates import PromptDescription client = openai.AsyncOpenAI( api_key="your-api-key" ) dataset = Dataset.from_hub("AutoIntent/clinc150_subset") prompt = PromptDescription( text="Describe intent {intent_name} with examples: {user_utterances} and patterns: {regex_patterns}", ) enhanced_dataset = generate_descriptions( dataset=dataset, client=client, prompt=prompt, model_name="gpt4o-mini", ) enhanced_dataset.to_csv("enhanced_clinc150.csv") ``` -------------------------------- ### Loading Data with AutoIntent Source: https://github.com/deeppavlov/autointent/blob/dev/docs/source/quickstart.rst Shows how to load data into an AutoIntent Dataset from a dictionary, JSON file, or Hugging Face Hub. ```python from autointent import Dataset # From dictionary dataset = Dataset.from_dict(data) # From JSON file dataset = Dataset.from_json("/path/to/your/data.json") # From Hugging Face Hub dataset = Dataset.from_hub("your-username/your-dataset") ``` -------------------------------- ### AutoML Pipeline Presets Source: https://github.com/deeppavlov/autointent/blob/dev/docs/source/quickstart.rst Initialize an AutoIntent pipeline using different preset configurations for various scenarios, from fast training to experimental transformer models. The pipeline is then trained on a dataset. ```python from autointent import Pipeline # Our quick and accurate SoTA pipeline = Pipeline.from_preset("classic-light") # If you have more training time pipeline = Pipeline.from_preset("classic-heavy") # Experimental preset with fine-tuning methods pipeline = Pipeline.from_preset("transformers-light") # Train the pipeline pipeline.fit(dataset) ``` -------------------------------- ### Initialize DatasetBalancer Source: https://github.com/deeppavlov/autointent/blob/dev/docs/source/augmentation_tutorials/balancer.rst Creates an instance of DatasetBalancer with the specified generator, prompt maker, and balancing parameters. `max_samples_per_class` determines the target number of samples for each class. ```python balancer = DatasetBalancer( generator=generator, prompt_maker=template, async_mode=False, # Set to True for faster generation with async processing max_samples_per_class=5, # Each class will have exactly 5 samples after balancing ) ``` -------------------------------- ### Synchronize Documentation Dependencies Source: https://github.com/deeppavlov/autointent/blob/dev/CONTRIBUTING.md Synchronizes dependencies for documentation builds, including extra groups like 'catboost', 'peft', 'transformers', 'sentence-transformers', and 'openai'. Pandoc is also required. ```bash uv sync --group docs --extra catboost --extra peft --extra transformers --extra sentence-transformers --extra openai ``` -------------------------------- ### Configure Dataset Balancing to Exact Sample Count Source: https://github.com/deeppavlov/autointent/blob/dev/docs/source/augmentation_tutorials/balancer.rst Sets up a DatasetBalancer to ensure each class has exactly 10 samples. Requires a generator and a prompt maker. ```python # To bring all classes to exactly 10 samples original_dataset = Dataset.from_dict(sample_data) exact_template = EnglishSynthesizerTemplate(dataset=original_dataset, split="train") exact_balancer = DatasetBalancer( generator=generator, prompt_maker=exact_template, max_samples_per_class=10 ) ``` -------------------------------- ### Run Documentation Tests Source: https://github.com/deeppavlov/autointent/blob/dev/CONTRIBUTING.md Executes doctests, similar to CI checks on PRs and pushes to the 'dev' branch. ```bash make test-docs ``` -------------------------------- ### Configure Hyperparameter Optimization Source: https://github.com/deeppavlov/autointent/blob/dev/docs/source/learn/automl_theory.rst Set up the HPOConfig for AutoIntent, specifying the sampler, number of trials, startup trials, timeout, and parallel jobs. ```python hpo_config = HPOConfig( sampler="tpe", n_trials=50, # Total optimization budget n_startup_trials=10, # Random initialization timeout=3600, # 1-hour time limit n_jobs=4 # Parallel trials ) ``` -------------------------------- ### Configure Dataset Balancing to Max Sample Count Source: https://github.com/deeppavlov/autointent/blob/dev/docs/source/augmentation_tutorials/balancer.rst Sets up a DatasetBalancer to balance classes to the level of the most represented class. `max_samples_per_class=None` achieves this. ```python # Balance to the level of the most represented class max_template = EnglishSynthesizerTemplate(dataset=original_dataset, split="train") max_balancer = DatasetBalancer( generator=generator, prompt_maker=max_template, max_samples_per_class=None # Will use the count of the most represented class ) ``` -------------------------------- ### Clean and Rebuild Documentation Source: https://github.com/deeppavlov/autointent/blob/dev/CONTRIBUTING.md Cleans the documentation build artifacts and then rebuilds the HTML documentation. Useful if the build is stale or links appear incorrect. ```bash make clean-docs make docs ``` -------------------------------- ### Configure Search Space with KNN and Linear Modules Source: https://github.com/deeppavlov/autointent/blob/dev/docs/source/learn/automl_theory.rst Defines a search space for hyperparameter tuning, specifying modules like 'knn' with parameter ranges for 'k' and 'weights', and 'linear' with 'cv' options. ```yaml search_space: - node_type: scoring target_metric: scoring_f1 search_space: - module_name: knn k: low: 1 high: 20 weights: [uniform, distance, closest] - module_name: linear cv: [3, 5, 10] ``` -------------------------------- ### Build an Intent Classifier with AutoIntent Source: https://github.com/deeppavlov/autointent/blob/dev/docs/source/index.rst Use this snippet to quickly build an intent classifier. It requires loading a dataset from a JSON file and fitting a pre-configured pipeline. ```python from autointent import Pipeline, Dataset dataset = Dataset.from_json(path_to_json) pipeline = Pipeline.from_preset("classic-light") pipeline.fit(dataset) pipeline.predict(["show me my latest recent transactions"]) ``` -------------------------------- ### Lint and Format Code Source: https://github.com/deeppavlov/autointent/blob/dev/CONTRIBUTING.md Checks code style and applies formatting using the make lint command. ```bash make lint ``` -------------------------------- ### Check Initial Class Distribution Source: https://github.com/deeppavlov/autointent/blob/dev/docs/source/augmentation_tutorials/balancer.rst Calculates and prints the class distribution of the initial training dataset. This helps visualize the imbalance before applying the balancing process. ```python # Check the initial distribution of classes in the training set initial_distribution = {} for sample in dataset["train"]: label = sample[Dataset.label_feature] initial_distribution[label] = initial_distribution.get(label, 0) + 1 print("Initial class distribution:") for class_id, count in sorted(initial_distribution.items()): intent = next(i for i in dataset.intents if i.id == class_id) print(f"Class {class_id} ({intent.name}): {count} samples") print(f"\nMost represented class: {max(initial_distribution.values())} samples") print(f"Least represented class: {min(initial_distribution.values())} samples") ``` -------------------------------- ### Augment Dataset using DSPYIncrementalUtteranceEvolver Source: https://github.com/deeppavlov/autointent/blob/dev/docs/source/augmentation_tutorials/dspy_augmentation.rst Augment a dataset using the DSPYIncrementalUtteranceEvolver. Configure API keys, model, and augmentation parameters. Refer to LiteLLM documentation for model configuration. ```python import os os.environ["OPENAI_API_KEY"] = "your-api-key" from autointent import Dataset from autointent.custom_types import Split dataset = Dataset.from_hub("AutoIntent/clinc150_subset") evolver = DSPYIncrementalUtteranceEvolver( "openai/gpt-4o-mini" ) augmented_dataset = evolver.augment( dataset, split_name=Split.TEST, n_evolutions=1, mipro_init_params={ "auto": "light", }, mipro_compile_params={ "minibatch": False, }, ) augmented_dataset.to_csv("clinc150_dspy_augment.csv") ``` -------------------------------- ### MCP Server Tools Source: https://github.com/deeppavlov/autointent/blob/dev/docs/source/server.rst The FastMCP server provides tools for prediction, class listing, and retrieving training data, accessible via stdio or HTTP transport. ```APIDOC ## MCP Tools ### predict #### Description Performs intent prediction on a list of utterances. #### Arguments - **utterances** (list[str]) - A list of text utterances to predict intents for. #### Returns - **predictions** (list) - A list of predictions, similar to the HTTP API. ### classes #### Description Retrieves a list of available intents (classes). #### Arguments - **page** (int) - Optional - The page number for pagination. - **page_size** (int) - Optional - The number of items per page. #### Returns - **classes** (list[Intent]) - A list of Intent objects, each containing id, name, tags, regex fields, and description. - **pagination_info** (object) - Information about the pagination. ### train_data #### Description Retrieves training data samples. #### Arguments - **page** (int) - Optional - The page number for pagination. - **page_size** (int) - Optional - The number of items per page. - **class_filter** (list[int]) - Optional - A list of class IDs to filter the training data by. #### Returns - **samples** (list[Sample]) - A list of Sample objects, each containing id, text, and label. - **pagination_info** (object) - Information about the pagination. ``` -------------------------------- ### Run All Project Tests Source: https://github.com/deeppavlov/autointent/blob/dev/CONTRIBUTING.md Executes all automated tests for the project to ensure changes do not break existing features. ```bash make test ``` -------------------------------- ### Check Data Split Readiness in AutoIntent Source: https://github.com/deeppavlov/autointent/blob/dev/docs/source/concepts.rst Use this function to validate if your data is suitable for splitting before fitting. It helps ensure proper handling of OOS samples. ```python autointent.context.data_handler.check_split_readiness ``` -------------------------------- ### Configure Log-Uniform Learning Rate Source: https://github.com/deeppavlov/autointent/blob/dev/docs/source/learn/automl_theory.rst Sets bounds for a learning rate parameter, enabling log-uniform sampling for better exploration of learning rate values. ```yaml learning_rate: low: 1.0e-5 # Prevent too slow learning high: 1.0e-2 # Prevent instability log: true # Log-uniform sampling ``` -------------------------------- ### Configure Data Splitting with Cross-Validation Source: https://github.com/deeppavlov/autointent/blob/dev/docs/source/learn/automl_theory.rst Sets up data configuration for cross-validation, specifying the scheme, number of folds, validation size, and a separation ratio to prevent data leakage. ```python from autointent.configs import DataConfig data_config = DataConfig( scheme="cv", # Cross-validation n_folds=5, # 5-fold CV validation_size=0.2, # 20% for validation in HO separation_ratio=0.5 # Prevent data leakage between modules ) ``` -------------------------------- ### Data Format: Multi-Label Classification Source: https://github.com/deeppavlov/autointent/blob/dev/docs/source/quickstart.rst Illustrates the dictionary structure for multi-label classification, using lists of 0s and 1s for labels. ```python data = { "train": [ {"utterance": "Book urgent flight to Paris", "label": [1, 0, 1]}, {"utterance": "What's the weather?", "label": [0, 1, 0]} ] } ``` -------------------------------- ### Zero-Shot Intent Classification with BiEncoderDescriptionScorer Source: https://github.com/deeppavlov/autointent/blob/dev/docs/source/learn/text_embeddings.rst Perform zero-shot intent classification by providing intent descriptions instead of training data. Requires fitting the scorer with descriptions and then predicting on new utterances. ```python from autointent.modules.scoring import BiEncoderDescriptionScorer scorer = BiEncoderDescriptionScorer() # Intent descriptions instead of training data descriptions = [ "User wants to book a flight", "User wants to cancel a reservation", "User asks about flight status" ] scorer.fit([], [], descriptions) predictions = scorer.predict(["I want to fly to London"]) ``` -------------------------------- ### Task-Specific Prompting for Embeddings Source: https://github.com/deeppavlov/autointent/blob/dev/docs/source/learn/text_embeddings.rst Use task-specific prompts to optimize embedding generation for different use cases like search queries, document passages, or intent classification. ```python query_embeddings = embedder.embed(queries, TaskTypeEnum.query) doc_embeddings = embedder.embed(documents, TaskTypeEnum.passage) intent_embeddings = embedder.embed(utterances, TaskTypeEnum.classification) ``` -------------------------------- ### Check Balanced Class Distribution Source: https://github.com/deeppavlov/autointent/blob/dev/docs/source/augmentation_tutorials/balancer.rst Calculates and prints the class distribution of the balanced training dataset. This verifies the effectiveness of the DatasetBalancer in achieving the desired class balance. ```python # Check the balanced distribution balanced_distribution = {} for sample in balanced_dataset["train"]: # The rest of the code to calculate and print balanced_distribution is missing in the source. ``` -------------------------------- ### Analyze Class Distribution Source: https://github.com/deeppavlov/autointent/blob/dev/docs/source/augmentation_tutorials/balancer.rst Prints the distribution of samples across different classes and identifies the most and least represented classes. Useful for understanding dataset imbalance. ```python label = sample[Dataset.label_feature] balanced_distribution[label] = balanced_distribution.get(label, 0) + 1 print("Balanced class distribution:") for class_id, count in sorted(balanced_distribution.items()): intent = next(i for i in dataset.intents if i.id == class_id) print(f"Class {class_id} ({intent.name}): {count} samples") print(f"\nMost represented class: {max(balanced_distribution.values())} samples") print(f"Least represented class: {min(balanced_distribution.values())} samples") ``` -------------------------------- ### Regenerate Optimizer JSON Schema Source: https://github.com/deeppavlov/autointent/blob/dev/CONTRIBUTING.md Regenerates the JSON schema for OptimizerConfig and related Pydantic models if they have changed. ```bash make schema ``` -------------------------------- ### Direct KNNScorer Usage Source: https://github.com/deeppavlov/autointent/blob/dev/docs/source/quickstart.rst Initialize and use the KNNScorer module directly for intent classification. This involves fitting the scorer with training utterances and labels, then making predictions on new inputs. ```python from autointent.modules import KNNScorer # Initialize a specific scorer scorer = KNNScorer( embedder_config="sentence-transformers/all-MiniLM-L6-v2", k=3 ) # Train on your data train_utterances = [ "Check my account balance", "Transfer money to account", "Show transaction history" ] train_labels = [0, 1, 0] scorer.fit(train_utterances, train_labels) # Make predictions predictions = scorer.predict([ "What's my current balance?", "Send money to my friend" ]) ``` -------------------------------- ### Balance the Dataset Source: https://github.com/deeppavlov/autointent/blob/dev/docs/source/augmentation_tutorials/balancer.rst Applies the DatasetBalancer to augment the dataset and balance the class distribution in the training split. `batch_size` controls the number of generations processed concurrently. ```python # Create a copy of the dataset dataset_copy = Dataset.from_dict(dataset.to_dict()) # Balance the training split balanced_dataset = balancer.balance( dataset=dataset_copy, split="train", batch_size=2, # Process generations in batches of 2 ) ``` -------------------------------- ### HTTP Server Endpoints Source: https://github.com/deeppavlov/autointent/blob/dev/docs/source/server.rst The FastAPI-based HTTP server exposes endpoints for health checks and predictions. It expects JSON payloads and returns JSON responses. ```APIDOC ## GET /health ### Description Checks the health status of the inference server. ### Method GET ### Endpoint /health ### Response #### Success Response (200) - **status** (string) - Indicates the server is healthy. #### Response Example { "status": "healthy" } ## POST /predict ### Description Performs intent prediction on a list of utterances. ### Method POST ### Endpoint /predict ### Parameters #### Request Body - **utterances** (list[string]) - Required - A list of text utterances to predict intents for. ### Request Example { "utterances": ["text one", "text two"] } ### Response #### Success Response (200) - **predictions** (list) - A list of predictions, one for each input utterance. The format depends on whether the pipeline is single-label or multi-label. #### Response Example { "predictions": [0, [1, 2]] } ``` -------------------------------- ### Data Format: Single-Label Classification Source: https://github.com/deeppavlov/autointent/blob/dev/docs/source/quickstart.rst Defines the dictionary structure for single-label classification data with train, validation, and test splits. ```python data = { "train": [ {"utterance": "Hello, how are you?", "label": 0}, {"utterance": "Book a flight to Paris", "label": 1}, {"utterance": "What's the weather like?", "label": 2} ], "validation": [ {"utterance": "Hi there!", "label": 0} ], "test": [ {"utterance": "Good morning", "label": 0} ] } ``` -------------------------------- ### AutoIntent EMNLP 2025 Paper Citation Source: https://github.com/deeppavlov/autointent/blob/dev/README.md Citation details for the AutoIntent EMNLP 2025 paper. ```bibtex @misc{alekseev2025autointentautomltextclassification, title={AutoIntent: AutoML for Text Classification}, author={Ilya Alekseev and Roman Solomatin and Darina Rustamova and Denis Kuznetsov}, year={2025}, eprint={2509.21138}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2509.21138}, } ``` === COMPLETE CONTENT === This response contains all available snippets from this library. No additional content exists. Do not make further requests.