### Complete DataFrameIt Example with File Export

Source: https://github.com/bdcdo/dataframeit/blob/main/docs/en/getting-started/quickstart.md

A comprehensive example combining Pydantic model definition, data preparation, DataFrameIt processing, and saving the results to an Excel file. This illustrates a full workflow from data input to output.

```python
from pydantic import BaseModel, Field
from typing import Literal
import pandas as pd
from dataframeit import dataframeit

# 1. Pydantic Model
class Sentiment(BaseModel):
    sentiment: Literal['positive', 'negative', 'neutral']
    confidence: Literal['high', 'medium', 'low']

# 2. Data
df = pd.DataFrame({
    'text': [
        'Excellent product! Exceeded expectations.',
        'Terrible service, never buying again.',
        'Delivery ok, average product.'
    ]
})

# 3. Process
result = dataframeit(df, Sentiment, "Analyze the sentiment of the text.", text_column='text')

# 4. Save
result.to_excel('result.xlsx', index=False)
```

--------------------------------

### Install DataFrameIt with All LLM Provider Support

Source: https://github.com/bdcdo/dataframeit/blob/main/docs/en/getting-started/installation.md

Installs the DataFrameIt library with support for all available LLM providers. This command installs all optional extras.

```bash
pip install dataframeit[all]
```

--------------------------------

### Install DataFrameIt with OpenAI Support

Source: https://github.com/bdcdo/dataframeit/blob/main/docs/en/getting-started/installation.md

Installs the DataFrameIt library with support for OpenAI LLM. This command requires the 'openai' extra to be installed.

```bash
pip install dataframeit[openai]
```

--------------------------------

### Install DataFrameIt with Google Gemini Support

Source: https://github.com/bdcdo/dataframeit/blob/main/docs/en/getting-started/installation.md

Installs the DataFrameIt library with support for Google Gemini LLM. This command requires the 'google' extra to be installed.

```bash
pip install dataframeit[google]
```

--------------------------------

### Run Jupyter Notebook Server (Bash)

Source: https://github.com/bdcdo/dataframeit/blob/main/docs/en/examples/index.md

This command starts the Jupyter Notebook server, allowing you to access and run the example notebooks through your web browser. The 'example/' argument specifies the directory to serve.

```bash
jupyter notebook example/
```

--------------------------------

### Install DataFrameIt with Anthropic Support

Source: https://github.com/bdcdo/dataframeit/blob/main/docs/en/getting-started/installation.md

Installs the DataFrameIt library with support for Anthropic LLM. This command requires the 'anthropic' extra to be installed.

```bash
pip install dataframeit[anthropic]
```

--------------------------------

### Install DataFrameIt Dependencies (Bash)

Source: https://github.com/bdcdo/dataframeit/blob/main/docs/en/examples/index.md

These commands install the necessary Python dependencies for DataFrameIt, including optional support for Google services, and the Jupyter Notebook environment.

```bash
pip install dataframeit[google]
pip install jupyter
```

--------------------------------

### Verify DataFrameIt Installation

Source: https://github.com/bdcdo/dataframeit/blob/main/docs/en/getting-started/installation.md

A simple Python script to verify that DataFrameIt has been installed successfully. It imports the library and prints a success message.

```python
from dataframeit import dataframeit
print("DataFrameIt installed successfully!")
```

--------------------------------

### Install DataFrameIt with Google Gemini and Polars Support

Source: https://github.com/bdcdo/dataframeit/blob/main/docs/en/getting-started/installation.md

Installs the DataFrameIt library with support for Google Gemini LLM and the Polars data manipulation library. This is for users who prefer Polars over Pandas.

```bash
pip install dataframeit[google,polars]
```

--------------------------------

### Prepare Input Data for DataFrameIt

Source: https://github.com/bdcdo/dataframeit/blob/main/docs/en/getting-started/quickstart.md

Shows how to prepare input data for DataFrameIt using different formats: pandas DataFrame, a list of strings, or a dictionary. This allows flexibility in how you provide the text data to be processed.

```python
import pandas as pd

df = pd.DataFrame({
    'text': [
        'Excellent product! Exceeded expectations.',
        'Terrible service, never buying again.',
        'Delivery ok, average product.'
    ]
})
```

```python
texts = [
    'Excellent product! Exceeded expectations.',
    'Terrible service, never buying again.',
    'Delivery ok, average product.'
]
```

```python
texts = {
    'review_001': 'Excellent product! Exceeded expectations.',
    'review_002': 'Terrible service, never buying again.',
    'review_003': 'Delivery ok, average product.'
}
```

--------------------------------

### Install and Configure Cohere Provider for DataFrameIT

Source: https://github.com/bdcdo/dataframeit/blob/main/docs/en/guides/providers.md

This snippet covers the installation of the Cohere provider and setting the `COHERE_API_KEY`. It includes a Python example for calling `dataframeit` with a specified Cohere model.

```bash
pip install langchain-cohere
export COHERE_API_KEY="your-key"
```

```python
result = dataframeit(
    df, Model, PROMPT,
    text_column='text',
    provider='cohere',
    model='command-r-plus'
)
```

--------------------------------

### Clone DataFrameIt Repository (Bash)

Source: https://github.com/bdcdo/dataframeit/blob/main/docs/en/examples/index.md

This command clones the DataFrameIt repository from GitHub to your local machine. It's the first step in setting up the examples.

```bash
git clone https://github.com/bdcdo/dataframeit.git
cd dataframeit
```

--------------------------------

### Process Data with DataFrameIt

Source: https://github.com/bdcdo/dataframeit/blob/main/docs/en/getting-started/quickstart.md

Demonstrates the core functionality of DataFrameIt by processing a pandas DataFrame with a Pydantic model and a custom prompt. It specifies the text column for analysis and prints the resulting DataFrame.

```python
from dataframeit import dataframeit

result = dataframeit(
    df,                                      # Your data
    Sentiment,                               # Pydantic model
    "Analyze the sentiment of the text.",    # Prompt
    text_column='text'                       # Column name
)

print(result)
```

--------------------------------

### Install and Configure Mistral Provider for DataFrameIT

Source: https://github.com/bdcdo/dataframeit/blob/main/docs/en/guides/providers.md

This section provides instructions for installing the Mistral AI provider and setting the `MISTRAL_API_KEY`. A Python code example demonstrates how to use `dataframeit` with a Mistral model.

```bash
pip install langchain-mistralai
export MISTRAL_API_KEY="your-key"
```

```python
result = dataframeit(
    df, Model, PROMPT,
    text_column='text',
    provider='mistral',
    model='mistral-large-latest'
)
```

--------------------------------

### Install DataFrameIt with LLM Provider Support

Source: https://github.com/bdcdo/dataframeit/blob/main/docs/en/index.md

These commands show how to install the dataframeit library with specific support for different LLM providers using pip. Users can choose to install support for Google Gemini, OpenAI, or Anthropic based on their needs. The `[provider]` syntax in pip allows for optional dependency installation.

```bash
pip install dataframeit[google]  # Google Gemini 3 (recommended)
```

```bash
pip install dataframeit[openai]  # OpenAI GPT-5
```

```bash
pip install dataframeit[anthropic]  # Anthropic Claude 4.5
```

--------------------------------

### Install and Configure Google Gemini Provider for DataFrameIT

Source: https://github.com/bdcdo/dataframeit/blob/main/docs/en/guides/providers.md

This snippet shows how to install the necessary package for Google Gemini support and set the API key using environment variables. It also provides Python code examples for using the `dataframeit` function with Google Gemini, both with default and explicit configurations, including passing model-specific keyword arguments.

```bash
pip install dataframeit[google]
export GOOGLE_API_KEY="your-key"
```

```python
# Default - no need to specify
result = dataframeit(df, Model, PROMPT, text_column='text')

# Explicit
result = dataframeit(
    df, Model, PROMPT,
    text_column='text',
    provider='google_genai',
    model='gemini-3.0-flash'
)

# With extra parameters
result = dataframeit(
    df, Model, PROMPT,
    text_column='text',
    provider='google_genai',
    model='gemini-2.5-pro',
    model_kwargs={
        'temperature': 0.2,
        'top_p': 0.9
    }
)
```

--------------------------------

### Install and Configure OpenAI Provider for DataFrameIT

Source: https://github.com/bdcdo/dataframeit/blob/main/docs/en/guides/providers.md

This section details the installation of the OpenAI provider for DataFrameIT and setting the `OPENAI_API_KEY` environment variable. It includes Python examples for calling `dataframeit` with different OpenAI models and configuring `model_kwargs` for advanced settings.

```bash
pip install dataframeit[openai]
export OPENAI_API_KEY="your-key"
```

```python
result = dataframeit(
    df, Model, PROMPT,
    text_column='text',
    provider='openai',
    model='gpt-5.2-mini'
)

# With advanced model
result = dataframeit(
    df, Model, PROMPT,
    text_column='text',
    provider='openai',
    model='gpt-5.2',
    model_kwargs={
        'temperature': 0.2
    }
)
```

--------------------------------

### Create Prompt Templates for LLM Interaction

Source: https://github.com/bdcdo/dataframeit/blob/main/docs/en/getting-started/concepts.md

Illustrates how to construct prompt templates for guiding the LLM. Templates can be simple strings or include placeholders like '{texto}' to dynamically insert input text.

```python
# Simple - text is automatically added at the end
PROMPT = "Classify the sentiment of the text."

# With placeholder - control where text appears
PROMPT = """
You are a specialized analyst.

Document:
{texto}

Extract the requested information from the document above.
"""
```

--------------------------------

### Install and Configure Anthropic Claude Provider for DataFrameIT

Source: https://github.com/bdcdo/dataframeit/blob/main/docs/en/guides/providers.md

Instructions for installing the Anthropic Claude provider and setting the `ANTHROPIC_API_KEY`. Python code examples demonstrate using `dataframeit` with various Claude models and specifying `model_kwargs`, such as `max_tokens`.

```bash
pip install dataframeit[anthropic]
export ANTHROPIC_API_KEY="your-key"
```

```python
result = dataframeit(
    df, Model, PROMPT,
    text_column='text',
    provider='anthropic',
    model='claude-sonnet-4.5'
)

# With max_tokens
result = dataframeit(
    df, Model, PROMPT,
    text_column='text',
    provider='anthropic',
    model='claude-opus-4.5',
    model_kwargs={
        'max_tokens': 4096
    }
)
```

--------------------------------

### Configure Google Gemini API Key

Source: https://github.com/bdcdo/dataframeit/blob/main/docs/en/getting-started/installation.md

Sets the GOOGLE_API_KEY environment variable for authenticating with Google Gemini. The API key can be obtained from the Google AI Studio.

```bash
export GOOGLE_API_KEY="your-google-key"
```

--------------------------------

### Configure Google API Key (Bash)

Source: https://github.com/bdcdo/dataframeit/blob/main/docs/en/examples/index.md

This command sets the GOOGLE_API_KEY environment variable, which is required for certain functionalities within DataFrameIt that interact with Google services. Replace 'your-key' with your actual API key.

```bash
export GOOGLE_API_KEY="your-key"
```

--------------------------------

### Complete DataFrameIt Example with Pydantic Model

Source: https://github.com/bdcdo/dataframeit/blob/main/docs/en/reference/llm-reference.md

Demonstrates a full workflow using DataFrameIt. It includes defining a Pydantic model for structured output, preparing a pandas DataFrame, calling the dataframeit function, and printing the resulting DataFrame with extracted information.

```python
from pydantic import BaseModel, Field
from typing import Literal, List, Optional
import pandas as pd
from dataframeit import dataframeit

# 1. Define Pydantic model
class Analysis(BaseModel):
    sentiment: Literal['positive', 'negative', 'neutral']
    confidence: Literal['high', 'medium', 'low']
    topics: List[str] = Field(description="Main topics")
    summary: str = Field(description="Summary in one sentence")

# 2. Data
df = pd.DataFrame({
    'text': [
        'Excellent product! Fast delivery.',
        'Terrible service, took too long.',
        'Ok, nothing special.'
    ]
})

# 3. Process
result = dataframeit(
    df,
    Analysis,
    "Analyze the text and extract the requested information.",
    text_column='text'
)

# 4. Result contains columns: text, sentiment, confidence, topics, summary
print(result)
```

--------------------------------

### Configure OpenAI API Key

Source: https://github.com/bdcdo/dataframeit/blob/main/docs/en/getting-started/installation.md

Sets the OPENAI_API_KEY environment variable for authenticating with OpenAI. The API key can be obtained from the OpenAI Platform.

```bash
export OPENAI_API_KEY="your-openai-key"
```

--------------------------------

### Configure Anthropic API Key

Source: https://github.com/bdcdo/dataframeit/blob/main/docs/en/getting-started/installation.md

Sets the ANTHROPIC_API_KEY environment variable for authenticating with Anthropic. The API key can be obtained from the Anthropic Console.

```bash
export ANTHROPIC_API_KEY="your-anthropic-key"
```

--------------------------------

### Configure Parallel Processing and Token Tracking in DataFrameIt (Python)

Source: https://context7.com/bdcdo/dataframeit/llms.txt

Illustrates how to optimize `dataframeit()` performance using parallel processing and monitor costs with token tracking. The example sets `parallel_requests` for concurrent operations and `track_tokens` for usage monitoring, suitable for large datasets.

```python
from pydantic import BaseModel
from typing import Literal
import pandas as pd
from dataframeit import dataframeit

class Analysis(BaseModel):
    category: Literal['tech', 'health', 'finance', 'other']
    relevance: Literal['high', 'medium', 'low']

df = pd.DataFrame({'texto': [f'Text {i}' for i in range(100)]})

# High-speed processing with 5 parallel workers
result = dataframeit(
    df, Analysis, "Categorize: {texto}",
    parallel_requests=5,        # 5 concurrent requests
    track_tokens=True           # Monitor token usage
)
```

--------------------------------

### DataFrameIt Basic Search Example (Python)

Source: https://github.com/bdcdo/dataframeit/blob/main/docs/en/guides/web-search.md

A concise example of enabling basic web search within DataFrameIt. The `use_search=True` parameter activates the search functionality for data enrichment.

```python
result = dataframeit(
    df, Model, PROMPT,
    text_column='text',
    use_search=True
)
```

--------------------------------

### Install DataFrameIt with Search Dependency

Source: https://github.com/bdcdo/dataframeit/blob/main/docs/en/guides/web-search.md

Install the necessary dependencies for DataFrameIt to enable web search functionality. This involves using pip to install the 'dataframeit' package with the 'search' extra or the 'langchain-tavily' package directly.

```bash
pip install dataframeit[search]
# or
pip install langchain-tavily
```

--------------------------------

### Create Pydantic Models for Legal Analysis (Python)

Source: https://github.com/bdcdo/dataframeit/blob/main/docs/en/examples/index.md

This snippet illustrates the creation of complex, nested Pydantic models for legal case analysis. It includes models for 'Party' and 'Decision', demonstrating features like lists of objects, literal types, and optional fields.

```python
from pydantic import BaseModel
from typing import Literal, List

class Party(BaseModel):
    name: str
    type: Literal['plaintiff', 'defendant']

class Decision(BaseModel):
    parties: List[Party]
    outcome: Literal['granted', 'denied']
```

--------------------------------

### Passing API Key Directly to DataFrameIT

Source: https://github.com/bdcdo/dataframeit/blob/main/docs/en/guides/providers.md

This Python code example shows an alternative method for providing API keys directly within the `dataframeit` function call, bypassing environment variables. It includes a warning about security best practices.

```python
result = dataframeit(
    df, Model, PROMPT,
    text_column='text',
    provider='openai',
    model='gpt-5.2-mini',
    api_key='sk-...'  # Your key directly
)
```

--------------------------------

### DataFrameIt Search Per Field Example (Python)

Source: https://github.com/bdcdo/dataframeit/blob/main/docs/en/guides/web-search.md

Illustrates how to perform a separate web search for each field in the Pydantic model using DataFrameIt. This is useful when dealing with models that have numerous fields, potentially improving search relevance.

```python
result = dataframeit(
    df, Model, PROMPT,
    text_column='text',
    use_search=True,
    search_per_field=True  # One search per model field
)
```

--------------------------------

### Defining Pydantic Models for DataFrameIt

Source: https://github.com/bdcdo/dataframeit/blob/main/docs/en/reference/llm-reference.md

Provides examples of defining Pydantic models for use with DataFrameIt. It covers defining fields with literal values, optional fields, lists, and nested model structures, all of which dictate the output schema for LLM extraction.

```python
from pydantic import BaseModel, Field
from typing import Literal, List, Optional

# Fields with fixed values
class Example(BaseModel):
    category: Literal['A', 'B', 'C']

# Optional fields
class Example(BaseModel):
    notes: Optional[str] = Field(default=None, description="Observations")

# Lists
class Example(BaseModel):
    tags: List[str] = Field(description="List of tags")

# Nested models
class Address(BaseModel):
    city: str
    state: str

class Person(BaseModel):
    name: str
    address: Optional[Address] = None
```

--------------------------------

### Create Pydantic Model for Sentiment Analysis (Python)

Source: https://github.com/bdcdo/dataframeit/blob/main/docs/en/examples/index.md

This snippet demonstrates how to define a basic Pydantic model for sentiment analysis, specifying the 'sentiment' field with allowed literal values. It serves as an introduction to data modeling within DataFrameIt.

```python
from pydantic import BaseModel
from typing import Literal

class Sentiment(BaseModel):
    sentiment: Literal['positive', 'negative', 'neutral']
```

--------------------------------

### Real Example: Legal Analysis Pydantic Model

Source: https://github.com/bdcdo/dataframeit/blob/main/docs/en/guides/structured-output.md

Provides a comprehensive Pydantic model (`CourtDecision`) for extracting structured information from legal documents, showcasing nested models for parties and claims, along with literal types and optional fields. It also includes a prompt for DataFrameIT.

```python
from pydantic import BaseModel, Field
from typing import List, Optional, Literal

class Party(BaseModel):
    """Party involved in the case."""
    name: str = Field(description="Full name of the party")
    type: Literal['plaintiff', 'defendant', 'third_party'] = Field(description="Party type")
    tax_id: Optional[str] = Field(default=None, description="Tax ID")

class Claim(BaseModel):
    """Claim made in the case."""
    description: str = Field(description="Claim description")
    amount: Optional[float] = Field(default=None, description="Amount in USD")
    granted: Optional[bool] = Field(default=None, description="Whether it was granted")

class CourtDecision(BaseModel):
    """Complete analysis of a court decision."""

    # Identification
    case_number: str = Field(description="Case number")
    court: str = Field(description="Court (e.g., Supreme Court, District Court)")
    decision_date: str = Field(description="Decision date (YYYY-MM-DD)")

    # Parties
    parties: List[Party] = Field(description="Parties involved")

    # Merit
    decision_type: Literal['judgment', 'ruling', 'order', 'interlocutory']
    outcome: Literal['granted', 'denied', 'partially_granted', 'dismissed']

    # Claims
    claims: List[Claim] = Field(description="Claims analyzed")

    # Summary
    summary: str = Field(description="Decision summary in up to 100 words")
    legal_grounds: List[str] = Field(description="Main legal grounds")

PROMPT = """
Analyze the court decision below and extract all relevant information.
Be precise with dates, amounts, and names.
If information is not available, use null.
"""

result = dataframeit(df_decisions, CourtDecision, PROMPT, text_column='text')
```

--------------------------------

### Configure DataFrameIt Retry Parameters (Python)

Source: https://github.com/bdcdo/dataframeit/blob/main/docs/en/guides/error-handling.md

This code example shows how to configure the automatic retry mechanism in DataFrameIt using exponential backoff. It allows setting maximum retries, base delay, and maximum delay to manage transient errors during processing. Default values are provided for context.

```python
result = dataframeit(
    df,
    Model,
    PROMPT,
    text_column='text',
    max_retries=5,        # Maximum attempts (default: 3)
    base_delay=2.0,       # Initial delay in seconds (default: 1.0)
    max_delay=60.0        # Maximum delay in seconds (default: 30.0)
)
```

--------------------------------

### Implement Rate Limiting in DataFrameIt (Python)

Source: https://github.com/bdcdo/dataframeit/blob/main/docs/en/guides/error-handling.md

This Python example shows how to prevent rate limit errors by configuring a delay between requests when using DataFrameIt. The `rate_limit_delay` parameter adds a specified pause, ensuring that API rate limits are not exceeded and processing is smoother.

```python
# Prevents rate limit errors
result = dataframeit(
    df, Model, PROMPT,
    text_column='text',
    rate_limit_delay=1.0  # 1 second between requests
)
```

--------------------------------

### Define Pydantic Model for Sentiment Analysis

Source: https://github.com/bdcdo/dataframeit/blob/main/docs/en/getting-started/quickstart.md

Defines a Pydantic model named 'Sentiment' with specific literal values for sentiment and confidence. This model ensures that the LLM returns valid, predefined categories for sentiment analysis.

```python
from pydantic import BaseModel, Field
from typing import Literal

class Sentiment(BaseModel):
    """Sentiment analysis of a text."""
    sentiment: Literal['positive', 'negative', 'neutral'] = Field(
        description="Overall sentiment of the text"
    )
    confidence: Literal['high', 'medium', 'low'] = Field(
        description="Confidence level in the classification"
    )
```

--------------------------------

### Configuring LLM Providers in DataFrameIt

Source: https://github.com/bdcdo/dataframeit/blob/main/docs/en/reference/llm-reference.md

Shows how to specify different LLM providers (Google Gemini, OpenAI, Anthropic) when using the DataFrameIt function. It demonstrates passing the `provider` argument and optionally specifying a particular `model` and `model_kwargs` for customization.

```python
# Google Gemini (default)
result = dataframeit(df, Model, PROMPT, text_column='text')

# OpenAI
result = dataframeit(
    df, Model, PROMPT,
    text_column='text',
    provider='openai',
    model='gpt-5.2-mini'
)

# Anthropic
result = dataframeit(
    df, Model, PROMPT,
    text_column='text',
    provider='anthropic',
    model='claude-sonnet-4.5'
)

# With extra parameters
result = dataframeit(
    df, Model, PROMPT,
    text_column='text',
    provider='openai',
    model='gpt-5.2-mini',
    model_kwargs={'temperature': 0.2}
)
```

--------------------------------

### DataFrameIt LLM Provider Configuration

Source: https://context7.com/bdcdo/dataframeit/llms.txt

Demonstrates how to configure and use different LLM providers (OpenAI, Anthropic, etc.) with DataFrameIt by specifying the `provider` and `model` parameters. Ensure the corresponding API keys are set as environment variables.

```APIDOC
## dataframeit() - With Different LLM Providers

### Description
Allows specifying different LLM providers and models for text processing. Requires setting the appropriate API key as an environment variable for the chosen provider.

### Method
`dataframeit` function call

### Parameters
#### Core Parameters
- **df** (pandas.DataFrame or polars.DataFrame or list or dict) - Input data.
- **output_model** (pydantic.BaseModel) - Pydantic model for output structure.
- **prompt_template** (str) - Prompt template.

#### LLM Provider Configuration
- **provider** (str) - Name of the LLM provider (e.g., 'google_genai', 'openai', 'anthropic').
- **model** (str) - Name of the specific model to use (e.g., 'gemini-3.0-flash', 'gpt-5.2-mini', 'claude-sonnet-4.5').
- **model_kwargs** (dict) - Additional parameters to pass to the LLM model (e.g., `{'temperature': 0.2, 'max_tokens': 1000}`).

### Environment Variables
- `GOOGLE_API_KEY`: For Google Gemini.
- `OPENAI_API_KEY`: For OpenAI models.
- `ANTHROPIC_API_KEY`: For Anthropic Claude models.

### Request Example (OpenAI)
```python
from pydantic import BaseModel
from typing import Literal
import pandas as pd
from dataframeit import dataframeit

class Sentiment(BaseModel):
    sentiment: Literal['positive', 'negative', 'neutral']

df = pd.DataFrame({'texto': ['Great service!', 'Terrible experience.']})

# export OPENAI_API_KEY="your-key"
result = dataframeit(
    df,
    Sentiment,
    "Analyze sentiment: {texto}",
    provider='openai',
    model='gpt-5.2-mini'
)

# With custom model parameters
result = dataframeit(
    df,
    Sentiment,
    "Analyze sentiment: {texto}",
    provider='openai',
    model='gpt-5.2',
    model_kwargs={
        'temperature': 0.2,
        'max_tokens': 1000
    }
)
```

### Request Example (Anthropic Claude)
```python
from pydantic import BaseModel
from typing import Literal
import pandas as pd
from dataframeit import dataframeit

class Sentiment(BaseModel):
    sentiment: Literal['positive', 'negative', 'neutral']

df = pd.DataFrame({'texto': ['Great service!', 'Terrible experience.']})

# export ANTHROPIC_API_KEY="your-key"
result = dataframeit(
    df,
    Sentiment,
    "Analyze sentiment: {texto}",
    provider='anthropic',
    model='claude-sonnet-4.5'
)
```

### Response
#### Success Response (200)
Returns the input DataFrame enriched with the extracted fields according to the specified Pydantic model.
```

--------------------------------

### Use Different LLM Providers with DataFrameIt (Python)

Source: https://context7.com/bdcdo/dataframeit/llms.txt

Shows how to configure and use various LLM providers (Google Gemini, OpenAI, Anthropic) with the `dataframeit()` function. It highlights setting the `provider` and `model` parameters and passing custom model arguments. Ensure the respective API keys are set as environment variables.

```python
from pydantic import BaseModel
from typing import Literal
import pandas as pd
from dataframeit import dataframeit

class Sentiment(BaseModel):
    sentiment: Literal['positive', 'negative', 'neutral']

df = pd.DataFrame({'texto': ['Great service!', 'Terrible experience.']})

# Google Gemini (default)
# export GOOGLE_API_KEY="your-key"
result = dataframeit(
    df, Sentiment, "Analyze sentiment: {texto}",
    provider='google_genai',
    model='gemini-3.0-flash'
)

# OpenAI
# export OPENAI_API_KEY="your-key"
result = dataframeit(
    df, Sentiment, "Analyze sentiment: {texto}",
    provider='openai',
    model='gpt-5.2-mini'
)

# Anthropic Claude
# export ANTHROPIC_API_KEY="your-key"
result = dataframeit(
    df, Sentiment, "Analyze sentiment: {texto}",
    provider='anthropic',
    model='claude-sonnet-4.5'
)

# With custom model parameters
result = dataframeit(
    df, Sentiment, "Analyze sentiment: {texto}",
    provider='openai',
    model='gpt-5.2',
    model_kwargs={
        'temperature': 0.2,
        'max_tokens': 1000
    }
)
```

--------------------------------

### Using Different Providers with DataFrameIt (Python)

Source: https://github.com/bdcdo/dataframeit/blob/main/docs/en/guides/basic-usage.md

Shows how to configure DataFrameIt to use different AI model providers like Google Gemini (default), OpenAI, and Anthropic Claude. Specify the provider and model name to switch between them. Requires dataframeit.

```python
# Google Gemini (default)
result = dataframeit(df, Model, PROMPT, text_column='text')
```

```python
# OpenAI
result = dataframeit(
    df, Model, PROMPT,
    text_column='text',
    provider='openai',
    model='gpt-5.2-mini'
)
```

```python
# Anthropic Claude
result = dataframeit(
    df, Model, PROMPT,
    text_column='text',
    provider='anthropic',
    model='claude-sonnet-4.5'
)
```

--------------------------------

### Load DataFrame with Different Normalization Options

Source: https://context7.com/bdcdo/dataframeit/llms.txt

Demonstrates loading data from Excel files with automatic normalization (JSON strings to Python objects), precise normalization using a Pydantic model, or no normalization at all. Also shows supported file formats and passing additional pandas arguments.

```python
from dataframeit import read_df
from pydantic import BaseModel
from typing import List, Dict

class Extraction(BaseModel):
    keywords: List[str]
    title: str

# Load with automatic normalization (JSON strings -> Python objects)
df_loaded = read_df('results.xlsx')
print(type(df_loaded['keywords'].iloc[0]))  # <class 'list'>

# Load with Pydantic model for precise normalization
df_loaded = read_df('results.xlsx', model=Extraction)

# Load without normalization
df_raw = read_df('results.xlsx', normalize=False)
print(type(df_raw['keywords'].iloc[0]))  # <class 'str'>

# Supported formats
df = read_df('data.xlsx')      # Excel
df = read_df('data.csv')       # CSV
df = read_df('data.parquet')   # Parquet
df = read_df('data.json')      # JSON

# Pass additional pandas arguments
df = read_df('data.csv', encoding='utf-8', sep=';')
df = read_df('data.xlsx', sheet_name='Sheet2')
```

--------------------------------

### Incremental Processing with DataFrame-it in Python

Source: https://github.com/bdcdo/dataframeit/blob/main/docs/en/reference/llm-reference.md

Demonstrates incremental processing using the DataFrame-it library. It shows how to save intermediate results to an Excel file and then load and continue processing from that saved state.

```python
# Process and save
result = dataframeit(df, Model, PROMPT, text_column='text', resume=True)
result.to_excel('partial.xlsx', index=False)

# Load and continue
df = pd.read_excel('partial.xlsx')
result = dataframeit(df, Model, PROMPT, text_column='text', resume=True)
```

--------------------------------

### Process DataFrame with LLM and Pydantic Model (Python)

Source: https://context7.com/bdcdo/dataframeit/llms.txt

Demonstrates the core functionality of `dataframeit()` to process a pandas DataFrame using a Pydantic model for structured output. It takes input data, a Pydantic model, and a prompt template to extract and append structured information. The default LLM provider is Google Gemini.

```python
from pydantic import BaseModel, Field
from typing import Literal, List
import pandas as pd
from dataframeit import dataframeit

# Define output structure with Pydantic
class ProductReview(BaseModel):
    sentiment: Literal['positive', 'negative', 'neutral']
    confidence: Literal['high', 'medium', 'low']
    keywords: List[str] = Field(description="Key topics mentioned")
    summary: str = Field(description="One sentence summary")

# Input data
df = pd.DataFrame({
    'texto': [
        'Amazing product! Exceeded all expectations, fast shipping.',
        'Terrible quality, broke after one week. Never buying again.',
        'Decent product for the price. Nothing special but works fine.'
    ]
})

# Process with default settings (Google Gemini)
result = dataframeit(
    df,
    ProductReview,
    "Analyze the following product review: {texto}"
)

# Result includes original text + extracted columns
print(result[['texto', 'sentiment', 'confidence', 'keywords', 'summary']])
```

--------------------------------

### Prompt Templating for DataFrame-it in Python

Source: https://github.com/bdcdo/dataframeit/blob/main/docs/en/reference/llm-reference.md

Illustrates two methods for defining prompt templates in Python for use with DataFrame-it. The first is a simple text string appended to the input, while the second uses a placeholder '{texto}' for more control over prompt structure.

```python
# Simple - text added at the end
PROMPT = "Classify the sentiment of the text."

# With placeholder - control the position
PROMPT = """
Analyze the document below:

{texto}

Extract the requested information.
"""
```

--------------------------------

### DataFrameIt Performance Tuning: Parallelism and Rate Limiting

Source: https://github.com/bdcdo/dataframeit/blob/main/docs/en/reference/llm-reference.md

Explains how to optimize DataFrameIt's performance by enabling parallel requests and configuring rate limit delays. This helps in processing large datasets more efficiently and avoiding API rate limit errors.

```python
# Parallel processing
result = dataframeit(
    df, Model, PROMPT,
    text_column='text',
    parallel_requests=5  # 5 simultaneous workers
)

# Rate limiting (prevents 429 error)
result = dataframeit(
    df, Model, PROMPT,
    text_column='text',
    rate_limit_delay=1.0  # 1 second between requests
)

# Combined
result = dataframeit(
    df, Model, PROMPT,
    text_column='text',
    parallel_requests=5,
    rate_limit_delay=0.5
)
```

--------------------------------

### Token Tracking and Cost Calculation

Source: https://github.com/bdcdo/dataframeit/blob/main/docs/en/guides/performance.md

Enables `track_tokens=True` to monitor input and output token usage per row and in total. This allows for accurate cost estimation by multiplying token counts with provider-specific prices. Additional columns like `_input_tokens`, `_output_tokens`, and `_total_tokens` are added to the DataFrame.

```python
result = dataframeit(
    df,
    Model,
    PROMPT,
    text_column='text',
    track_tokens=True
)

# At the end, displays:
# ============================================================
# TOKEN USAGE STATISTICS
# ============================================================
# Model: gemini-3.0-flash
# Total tokens: 15,432
#   • Input:  12,345 tokens
#   • Output: 3,087 tokens
# ============================================================

```

```python
result = dataframeit(df, Model, PROMPT, text_column='text', track_tokens=True)

# Example: Gemini 2.0 Flash prices
price_input = 0.075 / 1_000_000   # $0.075 per 1M tokens
price_output = 0.30 / 1_000_000   # $0.30 per 1M tokens

cost_input = result['_input_tokens'].sum() * price_input
cost_output = result['_output_tokens'].sum() * price_output
total_cost = cost_input + cost_output

print(f"Estimated cost: ${total_cost:.4f}")
```

--------------------------------

### Input Types for DataFrameIt (Python)

Source: https://github.com/bdcdo/dataframeit/blob/main/docs/en/guides/basic-usage.md

Demonstrates using DataFrameIt with different input data structures: lists, dictionaries, and pandas Series. Each input type is processed to produce a DataFrame, maintaining or adapting the index as appropriate. Requires pandas and dataframeit.

```python
# With List
texts = ['Text 1', 'Text 2', 'Text 3']
result = dataframeit(texts, Sentiment, PROMPT)
# Returns DataFrame with numeric index
```

```python
# With Dictionary
documents = {
    'doc_001': 'Content of document 1',
    'doc_002': 'Content of document 2',
}
result = dataframeit(documents, Sentiment, PROMPT)
# Returns DataFrame with keys as index
```

```python
# With Series
series = pd.Series(['Text A', 'Text B'], index=['id_1', 'id_2'])
result = dataframeit(series, Sentiment, PROMPT)
# Preserves original index
```

--------------------------------

### DataFrameIt Supported Input Data Types

Source: https://github.com/bdcdo/dataframeit/blob/main/docs/en/reference/llm-reference.md

Illustrates how to use DataFrameIt with various input data structures including pandas DataFrames, lists, dictionaries, and pandas Series. It highlights the requirement of `text_column` for DataFrames and its absence for other types.

```python
# DataFrame (requires text_column)
df = pd.DataFrame({'text': ['A', 'B']})
result = dataframeit(df, Model, PROMPT, text_column='text')

# List (no text_column needed)
texts = ['Text 1', 'Text 2']
result = dataframeit(texts, Model, PROMPT)

# Dictionary (keys become index)
docs = {'id1': 'Text 1', 'id2': 'Text 2'}
result = dataframeit(docs, Model, PROMPT)

# Series (preserves index)
series = pd.Series(['A', 'B'], index=['x', 'y'])
result = dataframeit(series, Model, PROMPT)
```

--------------------------------

### Basic DataFrameIt Web Search Usage (Python)

Source: https://github.com/bdcdo/dataframeit/blob/main/docs/en/guides/web-search.md

Demonstrates the basic usage of DataFrameIt with web search enabled. It defines a Pydantic model for company information, prepares a Pandas DataFrame, and then calls the dataframeit function with `use_search=True` to enrich the data using web searches.

```python
from pydantic import BaseModel, Field
from typing import Literal
import pandas as pd
from dataframeit import dataframeit

class CompanyInfo(BaseModel):
    sector: Literal['technology', 'health', 'finance', 'retail', 'other']
    description: str = Field(description="Brief company description")
    founded: str = Field(description="Year founded, if found")

# Data with company names
df = pd.DataFrame({
    'text': ['Microsoft', 'Stripe', 'DoorDash']
})

PROMPT = """
Based on available information and web search,
extract information about the mentioned company.
"""

# Enable web search with use_search=True
result = dataframeit(
    df,
    CompanyInfo,
    PROMPT,
    text_column='text',
    use_search=True,      # Enable web search
    max_results=5         # Number of results per search
)
```

--------------------------------

### DataFrameIt Function Signature and Parameters

Source: https://github.com/bdcdo/dataframeit/blob/main/docs/en/reference/llm-reference.md

Defines the core function signature for DataFrameIt, outlining all available parameters. These include data input, Pydantic model for schema, prompt template, text column specification, LLM provider and model selection, retry mechanisms, and optional web search integration.

```python
from dataframeit import dataframeit

result = dataframeit(
    data,                    # DataFrame, Series, list or dict
    questions,               # Pydantic model
    prompt,                  # Prompt template
    text_column='text',      # Column with texts (required for DataFrame)
    model='gemini-3.0-flash',
    provider='google_genai', # 'google_genai', 'openai', 'anthropic'
    resume=True,             # Continue from where it stopped
    parallel_requests=1,     # Parallel workers
    rate_limit_delay=0.0,    # Delay between requests (seconds)
    max_retries=3,           # Retry attempts on error
    track_tokens=True,       # Track token usage
    api_key=None,            # API key (uses env var if None)
    model_kwargs=None,       # Extra parameters (temperature, etc)
    # Web search (requires TAVILY_API_KEY)
    use_search=False,        # Enable web search
    search_per_field=False,  # Separate search per field
    max_results=5,           # Results per search
    search_depth='basic',    # 'basic' or 'advanced'
)
```

--------------------------------

### Text Summarization with Python

Source: https://github.com/bdcdo/dataframeit/blob/main/docs/en/guides/basic-usage.md

Generates a concise summary, key points, and main topic from text data using a pandas DataFrame and a Pydantic model. Requires pydantic and dataframeit. The output is structured by the Summary model.

```python
from pydantic import BaseModel, Field

class Summary(BaseModel):
    summary: str = Field(description="Summary in up to 50 words")
    key_points: list[str] = Field(description="List of 3-5 main points")
    main_topic: str = Field(description="Central topic in one word")

PROMPT = """
Analyze the text and extract a concise summary.
Identify the main points and central topic.
"""

result = dataframeit(df, Summary, PROMPT, text_column='text')
```

--------------------------------

### Economy Configuration for DataFrameIt

Source: https://github.com/bdcdo/dataframeit/blob/main/docs/en/guides/performance.md

Focuses on cost-effectiveness by running tasks sequentially (`parallel_requests=1`), applying a high `rate_limit_delay`, and selecting a cheaper model like `gemini-3.0-flash`. Token tracking is enabled to monitor exact usage.

```python
result = dataframeit(
    df,
    Model,
    PROMPT,
    text_column='text',
    parallel_requests=1,      # Sequential
    rate_limit_delay=1.5,     # High delay
    model='gemini-3.0-flash', # Cheap model
    track_tokens=True
)
```

--------------------------------

### Web Search Integration with dataframeit()

Source: https://context7.com/bdcdo/dataframeit/llms.txt

Enrich data by integrating real-time web search using Tavily. Configure search depth, number of results, and per-field searching for detailed information retrieval. Requires TAVILY_API_KEY.

```python
from pydantic import BaseModel, Field
from typing import Literal, List
import pandas as pd
from dataframeit import dataframeit

# export TAVILY_API_KEY="your-tavily-key"

class CompanyInfo(BaseModel):
    sector: Literal['technology', 'healthcare', 'finance', 'retail', 'other']
    description: str = Field(description="Brief company description")
    founded: str = Field(description="Year founded, if found")
    headquarters: str = Field(description="Company headquarters location")

df = pd.DataFrame({
    'texto': ['Microsoft', 'Nubank', 'Shopify']
})

# Basic web search
result = dataframeit(
    df, CompanyInfo,
    "Research and extract information about the company: {texto}",
    use_search=True,            # Enable Tavily web search
    max_results=5               # Number of search results per query
)

# Advanced search with more depth
result = dataframeit(
    df, CompanyInfo,
    "Research the company: {texto}",
    use_search=True,
    search_depth='advanced',    # 'basic' (1 credit) or 'advanced' (2 credits)
    max_results=10
)

# Search per field (separate search for each Pydantic field)
class DetailedInfo(BaseModel):
    financial_data: str = Field(description="Recent financial performance")
    products: List[str] = Field(description="Main products or services")
    competitors: List[str] = Field(description="Main competitors")

result = dataframeit(
    df, DetailedInfo,
    "Research: {texto}",
    use_search=True,
    search_per_field=True       # Separate search for each field
)

# Search metrics tracked in output
print(f"Search credits used: {result['_search_credits'].sum()}")
print(f"Total searches: {result['_search_count'].sum()}")
```

--------------------------------

### DataFrameIt Parallel Processing and Performance

Source: https://context7.com/bdcdo/dataframeit/llms.txt

Details on optimizing processing speed using parallel requests, rate limiting, and automatic retries with exponential backoff. Includes token tracking for cost monitoring.

```APIDOC
## dataframeit() - Parallel Processing and Performance

### Description
Configure parameters to control processing speed, reliability, and cost. Enables parallel requests, automatic retries, and token usage tracking.

### Method
`dataframeit` function call

### Parameters
#### Core Parameters
- **df** (pandas.DataFrame or polars.DataFrame or list or dict) - Input data.
- **output_model** (pydantic.BaseModel) - Pydantic model for output structure.
- **prompt_template** (str) - Prompt template.

#### Performance and Reliability Parameters
- **parallel_requests** (int) - The number of concurrent requests to the LLM API. Increases throughput for large datasets. Defaults to 1.
- **track_tokens** (bool) - If True, token usage for each request will be tracked and included in the output, aiding cost monitoring. Defaults to False.
- **retry_strategy** (dict) - Configuration for automatic retries on API errors, including exponential backoff. Example: `{'max_attempts': 3, 'backoff_factor': 0.5}`.

### Request Example
```python
from pydantic import BaseModel
from typing import Literal
import pandas as pd
from dataframeit import dataframeit

class Analysis(BaseModel):
    category: Literal['tech', 'health', 'finance', 'other']
    relevance: Literal['high', 'medium', 'low']

df = pd.DataFrame({'texto': [f'Text {i}' for i in range(100)]})

# Process with 5 parallel workers and token tracking enabled
result = dataframeit(
    df,
    Analysis,
    "Categorize: {texto}",
    parallel_requests=5,        # Process up to 5 requests concurrently
    track_tokens=True           # Enable token usage monitoring
)

print(result)
```

### Response
#### Success Response (200)
Returns the input DataFrame enriched with extracted fields. If `track_tokens` is True, additional columns for prompt tokens, completion tokens, and total tokens will be included.
```

--------------------------------

### Configure Fields with Custom Prompts in dataframeit()

Source: https://context7.com/bdcdo/dataframeit/llms.txt

Configure custom prompts and search parameters for individual fields using Pydantic's json_schema_extra. This requires the search_per_field=True argument. It allows for replacing or appending to default prompts and overriding search parameters like search_depth and max_results.

```python
from pydantic import BaseModel, Field
import pandas as pd
from dataframeit import dataframeit

class MedicationInfo(BaseModel):
    # Standard field with default behavior
    active_ingredient: str = Field(description="Active ingredient of the medication")

    # Field with completely replaced prompt
    rare_disease: str = Field(
        description="Rare disease classification",
        json_schema_extra={
            "prompt": "Search Orphanet (orpha.net) for rare disease info. Analyze: {texto}"
        }
    )

    # Field with appended prompt instructions
    regulatory_status: str = Field(
        description="FDA approval status",
        json_schema_extra={
            "prompt_append": "Search ONLY FDA.gov for regulatory information."
        }
    )

    # Field with custom search parameters
    clinical_trials: str = Field(
        description="Relevant clinical trials",
        json_schema_extra={
            "prompt_append": "Find recent clinical trials (2020-2024).",
            "search_depth": "advanced",  # Override search depth
            "max_results": 10            # Override max results
        }
    )

df = pd.DataFrame({'texto': ['Pembrolizumab', 'Trastuzumab']})

result = dataframeit(
    df,
    MedicationInfo,
    "Analyze the medication: {texto}",
    use_search=True,
    search_per_field=True  # Required for per-field configuration
)
```

--------------------------------

### Use a More Capable Model in DataFrameIt (Python)

Source: https://github.com/bdcdo/dataframeit/blob/main/docs/en/guides/error-handling.md

This code snippet illustrates how to select a more capable language model within DataFrameIt to potentially reduce errors caused by complex prompts or data. It shows how to specify a different model, like 'gemini-2.5-pro', which might offer better performance than smaller models.

```python
# If errors persist, try a more capable model
result = dataframeit(
    df, Model, PROMPT,
    text_column='text',
    model='gemini-2.5-pro'  # More capable than flash
)
```

--------------------------------

### Configure Tavily API Key

Source: https://github.com/bdcdo/dataframeit/blob/main/docs/en/guides/web-search.md

Set the TAVILY_API_KEY environment variable with your Tavily API key. This key is required for authentication and enabling web search operations.

```bash
export TAVILY_API_KEY="your-tavily-key"
```

--------------------------------

### Rate-Limited and Balanced Processing with dataframeit()

Source: https://context7.com/bdcdo/dataframeit/llms.txt

Configure rate limiting and retry mechanisms for robust API interactions. The balanced approach uses parallel requests for efficiency on large datasets, with options to track token usage.

```python
from dataframeit import dataframeit

# Rate-limited processing (prevents 429 errors)
result = dataframeit(
    df, Analysis, "Categorize: {texto}",
    rate_limit_delay=1.0,       # 1 second between requests (60 req/min)
    max_retries=5,              # Retry failed requests up to 5 times
    base_delay=2.0,             # Initial retry delay
    max_delay=30.0              # Maximum retry delay
)

# Balanced approach for large datasets
result = dataframeit(
    df, Analysis, "Categorize: {texto}",
    parallel_requests=3,
    rate_limit_delay=0.5,
    track_tokens=True
)
```