### Complete DataFrameIt Example with File Export Source: https://github.com/bdcdo/dataframeit/blob/main/docs/en/getting-started/quickstart.md A comprehensive example combining Pydantic model definition, data preparation, DataFrameIt processing, and saving the results to an Excel file. This illustrates a full workflow from data input to output. ```python from pydantic import BaseModel, Field from typing import Literal import pandas as pd from dataframeit import dataframeit # 1. Pydantic Model class Sentiment(BaseModel): sentiment: Literal['positive', 'negative', 'neutral'] confidence: Literal['high', 'medium', 'low'] # 2. Data df = pd.DataFrame({ 'text': [ 'Excellent product! Exceeded expectations.', 'Terrible service, never buying again.', 'Delivery ok, average product.' ] }) # 3. Process result = dataframeit(df, Sentiment, "Analyze the sentiment of the text.", text_column='text') # 4. Save result.to_excel('result.xlsx', index=False) ``` -------------------------------- ### Install DataFrameIt with All LLM Provider Support Source: https://github.com/bdcdo/dataframeit/blob/main/docs/en/getting-started/installation.md Installs the DataFrameIt library with support for all available LLM providers. This command installs all optional extras. ```bash pip install dataframeit[all] ``` -------------------------------- ### Install DataFrameIt with OpenAI Support Source: https://github.com/bdcdo/dataframeit/blob/main/docs/en/getting-started/installation.md Installs the DataFrameIt library with support for OpenAI LLM. This command requires the 'openai' extra to be installed. ```bash pip install dataframeit[openai] ``` -------------------------------- ### Install DataFrameIt with Google Gemini Support Source: https://github.com/bdcdo/dataframeit/blob/main/docs/en/getting-started/installation.md Installs the DataFrameIt library with support for Google Gemini LLM. This command requires the 'google' extra to be installed. ```bash pip install dataframeit[google] ``` -------------------------------- ### Run Jupyter Notebook Server (Bash) Source: https://github.com/bdcdo/dataframeit/blob/main/docs/en/examples/index.md This command starts the Jupyter Notebook server, allowing you to access and run the example notebooks through your web browser. The 'example/' argument specifies the directory to serve. ```bash jupyter notebook example/ ``` -------------------------------- ### Install DataFrameIt with Anthropic Support Source: https://github.com/bdcdo/dataframeit/blob/main/docs/en/getting-started/installation.md Installs the DataFrameIt library with support for Anthropic LLM. This command requires the 'anthropic' extra to be installed. ```bash pip install dataframeit[anthropic] ``` -------------------------------- ### Install DataFrameIt Dependencies (Bash) Source: https://github.com/bdcdo/dataframeit/blob/main/docs/en/examples/index.md These commands install the necessary Python dependencies for DataFrameIt, including optional support for Google services, and the Jupyter Notebook environment. ```bash pip install dataframeit[google] pip install jupyter ``` -------------------------------- ### Verify DataFrameIt Installation Source: https://github.com/bdcdo/dataframeit/blob/main/docs/en/getting-started/installation.md A simple Python script to verify that DataFrameIt has been installed successfully. It imports the library and prints a success message. ```python from dataframeit import dataframeit print("DataFrameIt installed successfully!") ``` -------------------------------- ### Install DataFrameIt with Google Gemini and Polars Support Source: https://github.com/bdcdo/dataframeit/blob/main/docs/en/getting-started/installation.md Installs the DataFrameIt library with support for Google Gemini LLM and the Polars data manipulation library. This is for users who prefer Polars over Pandas. ```bash pip install dataframeit[google,polars] ``` -------------------------------- ### Prepare Input Data for DataFrameIt Source: https://github.com/bdcdo/dataframeit/blob/main/docs/en/getting-started/quickstart.md Shows how to prepare input data for DataFrameIt using different formats: pandas DataFrame, a list of strings, or a dictionary. This allows flexibility in how you provide the text data to be processed. ```python import pandas as pd df = pd.DataFrame({ 'text': [ 'Excellent product! Exceeded expectations.', 'Terrible service, never buying again.', 'Delivery ok, average product.' ] }) ``` ```python texts = [ 'Excellent product! Exceeded expectations.', 'Terrible service, never buying again.', 'Delivery ok, average product.' ] ``` ```python texts = { 'review_001': 'Excellent product! Exceeded expectations.', 'review_002': 'Terrible service, never buying again.', 'review_003': 'Delivery ok, average product.' } ``` -------------------------------- ### Install and Configure Cohere Provider for DataFrameIT Source: https://github.com/bdcdo/dataframeit/blob/main/docs/en/guides/providers.md This snippet covers the installation of the Cohere provider and setting the `COHERE_API_KEY`. It includes a Python example for calling `dataframeit` with a specified Cohere model. ```bash pip install langchain-cohere export COHERE_API_KEY="your-key" ``` ```python result = dataframeit( df, Model, PROMPT, text_column='text', provider='cohere', model='command-r-plus' ) ``` -------------------------------- ### Clone DataFrameIt Repository (Bash) Source: https://github.com/bdcdo/dataframeit/blob/main/docs/en/examples/index.md This command clones the DataFrameIt repository from GitHub to your local machine. It's the first step in setting up the examples. ```bash git clone https://github.com/bdcdo/dataframeit.git cd dataframeit ``` -------------------------------- ### Process Data with DataFrameIt Source: https://github.com/bdcdo/dataframeit/blob/main/docs/en/getting-started/quickstart.md Demonstrates the core functionality of DataFrameIt by processing a pandas DataFrame with a Pydantic model and a custom prompt. It specifies the text column for analysis and prints the resulting DataFrame. ```python from dataframeit import dataframeit result = dataframeit( df, # Your data Sentiment, # Pydantic model "Analyze the sentiment of the text.", # Prompt text_column='text' # Column name ) print(result) ``` -------------------------------- ### Install and Configure Mistral Provider for DataFrameIT Source: https://github.com/bdcdo/dataframeit/blob/main/docs/en/guides/providers.md This section provides instructions for installing the Mistral AI provider and setting the `MISTRAL_API_KEY`. A Python code example demonstrates how to use `dataframeit` with a Mistral model. ```bash pip install langchain-mistralai export MISTRAL_API_KEY="your-key" ``` ```python result = dataframeit( df, Model, PROMPT, text_column='text', provider='mistral', model='mistral-large-latest' ) ``` -------------------------------- ### Install DataFrameIt with LLM Provider Support Source: https://github.com/bdcdo/dataframeit/blob/main/docs/en/index.md These commands show how to install the dataframeit library with specific support for different LLM providers using pip. Users can choose to install support for Google Gemini, OpenAI, or Anthropic based on their needs. The `[provider]` syntax in pip allows for optional dependency installation. ```bash pip install dataframeit[google] # Google Gemini 3 (recommended) ``` ```bash pip install dataframeit[openai] # OpenAI GPT-5 ``` ```bash pip install dataframeit[anthropic] # Anthropic Claude 4.5 ``` -------------------------------- ### Install and Configure Google Gemini Provider for DataFrameIT Source: https://github.com/bdcdo/dataframeit/blob/main/docs/en/guides/providers.md This snippet shows how to install the necessary package for Google Gemini support and set the API key using environment variables. It also provides Python code examples for using the `dataframeit` function with Google Gemini, both with default and explicit configurations, including passing model-specific keyword arguments. ```bash pip install dataframeit[google] export GOOGLE_API_KEY="your-key" ``` ```python # Default - no need to specify result = dataframeit(df, Model, PROMPT, text_column='text') # Explicit result = dataframeit( df, Model, PROMPT, text_column='text', provider='google_genai', model='gemini-3.0-flash' ) # With extra parameters result = dataframeit( df, Model, PROMPT, text_column='text', provider='google_genai', model='gemini-2.5-pro', model_kwargs={ 'temperature': 0.2, 'top_p': 0.9 } ) ``` -------------------------------- ### Install and Configure OpenAI Provider for DataFrameIT Source: https://github.com/bdcdo/dataframeit/blob/main/docs/en/guides/providers.md This section details the installation of the OpenAI provider for DataFrameIT and setting the `OPENAI_API_KEY` environment variable. It includes Python examples for calling `dataframeit` with different OpenAI models and configuring `model_kwargs` for advanced settings. ```bash pip install dataframeit[openai] export OPENAI_API_KEY="your-key" ``` ```python result = dataframeit( df, Model, PROMPT, text_column='text', provider='openai', model='gpt-5.2-mini' ) # With advanced model result = dataframeit( df, Model, PROMPT, text_column='text', provider='openai', model='gpt-5.2', model_kwargs={ 'temperature': 0.2 } ) ``` -------------------------------- ### Create Prompt Templates for LLM Interaction Source: https://github.com/bdcdo/dataframeit/blob/main/docs/en/getting-started/concepts.md Illustrates how to construct prompt templates for guiding the LLM. Templates can be simple strings or include placeholders like '{texto}' to dynamically insert input text. ```python # Simple - text is automatically added at the end PROMPT = "Classify the sentiment of the text." # With placeholder - control where text appears PROMPT = """ You are a specialized analyst. Document: {texto} Extract the requested information from the document above. """ ``` -------------------------------- ### Install and Configure Anthropic Claude Provider for DataFrameIT Source: https://github.com/bdcdo/dataframeit/blob/main/docs/en/guides/providers.md Instructions for installing the Anthropic Claude provider and setting the `ANTHROPIC_API_KEY`. Python code examples demonstrate using `dataframeit` with various Claude models and specifying `model_kwargs`, such as `max_tokens`. ```bash pip install dataframeit[anthropic] export ANTHROPIC_API_KEY="your-key" ``` ```python result = dataframeit( df, Model, PROMPT, text_column='text', provider='anthropic', model='claude-sonnet-4.5' ) # With max_tokens result = dataframeit( df, Model, PROMPT, text_column='text', provider='anthropic', model='claude-opus-4.5', model_kwargs={ 'max_tokens': 4096 } ) ``` -------------------------------- ### Configure Google Gemini API Key Source: https://github.com/bdcdo/dataframeit/blob/main/docs/en/getting-started/installation.md Sets the GOOGLE_API_KEY environment variable for authenticating with Google Gemini. The API key can be obtained from the Google AI Studio. ```bash export GOOGLE_API_KEY="your-google-key" ``` -------------------------------- ### Configure Google API Key (Bash) Source: https://github.com/bdcdo/dataframeit/blob/main/docs/en/examples/index.md This command sets the GOOGLE_API_KEY environment variable, which is required for certain functionalities within DataFrameIt that interact with Google services. Replace 'your-key' with your actual API key. ```bash export GOOGLE_API_KEY="your-key" ``` -------------------------------- ### Complete DataFrameIt Example with Pydantic Model Source: https://github.com/bdcdo/dataframeit/blob/main/docs/en/reference/llm-reference.md Demonstrates a full workflow using DataFrameIt. It includes defining a Pydantic model for structured output, preparing a pandas DataFrame, calling the dataframeit function, and printing the resulting DataFrame with extracted information. ```python from pydantic import BaseModel, Field from typing import Literal, List, Optional import pandas as pd from dataframeit import dataframeit # 1. Define Pydantic model class Analysis(BaseModel): sentiment: Literal['positive', 'negative', 'neutral'] confidence: Literal['high', 'medium', 'low'] topics: List[str] = Field(description="Main topics") summary: str = Field(description="Summary in one sentence") # 2. Data df = pd.DataFrame({ 'text': [ 'Excellent product! Fast delivery.', 'Terrible service, took too long.', 'Ok, nothing special.' ] }) # 3. Process result = dataframeit( df, Analysis, "Analyze the text and extract the requested information.", text_column='text' ) # 4. Result contains columns: text, sentiment, confidence, topics, summary print(result) ``` -------------------------------- ### Configure OpenAI API Key Source: https://github.com/bdcdo/dataframeit/blob/main/docs/en/getting-started/installation.md Sets the OPENAI_API_KEY environment variable for authenticating with OpenAI. The API key can be obtained from the OpenAI Platform. ```bash export OPENAI_API_KEY="your-openai-key" ``` -------------------------------- ### Configure Anthropic API Key Source: https://github.com/bdcdo/dataframeit/blob/main/docs/en/getting-started/installation.md Sets the ANTHROPIC_API_KEY environment variable for authenticating with Anthropic. The API key can be obtained from the Anthropic Console. ```bash export ANTHROPIC_API_KEY="your-anthropic-key" ``` -------------------------------- ### Configure Parallel Processing and Token Tracking in DataFrameIt (Python) Source: https://context7.com/bdcdo/dataframeit/llms.txt Illustrates how to optimize `dataframeit()` performance using parallel processing and monitor costs with token tracking. The example sets `parallel_requests` for concurrent operations and `track_tokens` for usage monitoring, suitable for large datasets. ```python from pydantic import BaseModel from typing import Literal import pandas as pd from dataframeit import dataframeit class Analysis(BaseModel): category: Literal['tech', 'health', 'finance', 'other'] relevance: Literal['high', 'medium', 'low'] df = pd.DataFrame({'texto': [f'Text {i}' for i in range(100)]}) # High-speed processing with 5 parallel workers result = dataframeit( df, Analysis, "Categorize: {texto}", parallel_requests=5, # 5 concurrent requests track_tokens=True # Monitor token usage ) ``` -------------------------------- ### DataFrameIt Basic Search Example (Python) Source: https://github.com/bdcdo/dataframeit/blob/main/docs/en/guides/web-search.md A concise example of enabling basic web search within DataFrameIt. The `use_search=True` parameter activates the search functionality for data enrichment. ```python result = dataframeit( df, Model, PROMPT, text_column='text', use_search=True ) ``` -------------------------------- ### Install DataFrameIt with Search Dependency Source: https://github.com/bdcdo/dataframeit/blob/main/docs/en/guides/web-search.md Install the necessary dependencies for DataFrameIt to enable web search functionality. This involves using pip to install the 'dataframeit' package with the 'search' extra or the 'langchain-tavily' package directly. ```bash pip install dataframeit[search] # or pip install langchain-tavily ``` -------------------------------- ### Create Pydantic Models for Legal Analysis (Python) Source: https://github.com/bdcdo/dataframeit/blob/main/docs/en/examples/index.md This snippet illustrates the creation of complex, nested Pydantic models for legal case analysis. It includes models for 'Party' and 'Decision', demonstrating features like lists of objects, literal types, and optional fields. ```python from pydantic import BaseModel from typing import Literal, List class Party(BaseModel): name: str type: Literal['plaintiff', 'defendant'] class Decision(BaseModel): parties: List[Party] outcome: Literal['granted', 'denied'] ``` -------------------------------- ### Passing API Key Directly to DataFrameIT Source: https://github.com/bdcdo/dataframeit/blob/main/docs/en/guides/providers.md This Python code example shows an alternative method for providing API keys directly within the `dataframeit` function call, bypassing environment variables. It includes a warning about security best practices. ```python result = dataframeit( df, Model, PROMPT, text_column='text', provider='openai', model='gpt-5.2-mini', api_key='sk-...' # Your key directly ) ``` -------------------------------- ### DataFrameIt Search Per Field Example (Python) Source: https://github.com/bdcdo/dataframeit/blob/main/docs/en/guides/web-search.md Illustrates how to perform a separate web search for each field in the Pydantic model using DataFrameIt. This is useful when dealing with models that have numerous fields, potentially improving search relevance. ```python result = dataframeit( df, Model, PROMPT, text_column='text', use_search=True, search_per_field=True # One search per model field ) ``` -------------------------------- ### Defining Pydantic Models for DataFrameIt Source: https://github.com/bdcdo/dataframeit/blob/main/docs/en/reference/llm-reference.md Provides examples of defining Pydantic models for use with DataFrameIt. It covers defining fields with literal values, optional fields, lists, and nested model structures, all of which dictate the output schema for LLM extraction. ```python from pydantic import BaseModel, Field from typing import Literal, List, Optional # Fields with fixed values class Example(BaseModel): category: Literal['A', 'B', 'C'] # Optional fields class Example(BaseModel): notes: Optional[str] = Field(default=None, description="Observations") # Lists class Example(BaseModel): tags: List[str] = Field(description="List of tags") # Nested models class Address(BaseModel): city: str state: str class Person(BaseModel): name: str address: Optional[Address] = None ``` -------------------------------- ### Create Pydantic Model for Sentiment Analysis (Python) Source: https://github.com/bdcdo/dataframeit/blob/main/docs/en/examples/index.md This snippet demonstrates how to define a basic Pydantic model for sentiment analysis, specifying the 'sentiment' field with allowed literal values. It serves as an introduction to data modeling within DataFrameIt. ```python from pydantic import BaseModel from typing import Literal class Sentiment(BaseModel): sentiment: Literal['positive', 'negative', 'neutral'] ``` -------------------------------- ### Real Example: Legal Analysis Pydantic Model Source: https://github.com/bdcdo/dataframeit/blob/main/docs/en/guides/structured-output.md Provides a comprehensive Pydantic model (`CourtDecision`) for extracting structured information from legal documents, showcasing nested models for parties and claims, along with literal types and optional fields. It also includes a prompt for DataFrameIT. ```python from pydantic import BaseModel, Field from typing import List, Optional, Literal class Party(BaseModel): """Party involved in the case.""" name: str = Field(description="Full name of the party") type: Literal['plaintiff', 'defendant', 'third_party'] = Field(description="Party type") tax_id: Optional[str] = Field(default=None, description="Tax ID") class Claim(BaseModel): """Claim made in the case.""" description: str = Field(description="Claim description") amount: Optional[float] = Field(default=None, description="Amount in USD") granted: Optional[bool] = Field(default=None, description="Whether it was granted") class CourtDecision(BaseModel): """Complete analysis of a court decision.""" # Identification case_number: str = Field(description="Case number") court: str = Field(description="Court (e.g., Supreme Court, District Court)") decision_date: str = Field(description="Decision date (YYYY-MM-DD)") # Parties parties: List[Party] = Field(description="Parties involved") # Merit decision_type: Literal['judgment', 'ruling', 'order', 'interlocutory'] outcome: Literal['granted', 'denied', 'partially_granted', 'dismissed'] # Claims claims: List[Claim] = Field(description="Claims analyzed") # Summary summary: str = Field(description="Decision summary in up to 100 words") legal_grounds: List[str] = Field(description="Main legal grounds") PROMPT = """ Analyze the court decision below and extract all relevant information. Be precise with dates, amounts, and names. If information is not available, use null. """ result = dataframeit(df_decisions, CourtDecision, PROMPT, text_column='text') ``` -------------------------------- ### Configure DataFrameIt Retry Parameters (Python) Source: https://github.com/bdcdo/dataframeit/blob/main/docs/en/guides/error-handling.md This code example shows how to configure the automatic retry mechanism in DataFrameIt using exponential backoff. It allows setting maximum retries, base delay, and maximum delay to manage transient errors during processing. Default values are provided for context. ```python result = dataframeit( df, Model, PROMPT, text_column='text', max_retries=5, # Maximum attempts (default: 3) base_delay=2.0, # Initial delay in seconds (default: 1.0) max_delay=60.0 # Maximum delay in seconds (default: 30.0) ) ``` -------------------------------- ### Implement Rate Limiting in DataFrameIt (Python) Source: https://github.com/bdcdo/dataframeit/blob/main/docs/en/guides/error-handling.md This Python example shows how to prevent rate limit errors by configuring a delay between requests when using DataFrameIt. The `rate_limit_delay` parameter adds a specified pause, ensuring that API rate limits are not exceeded and processing is smoother. ```python # Prevents rate limit errors result = dataframeit( df, Model, PROMPT, text_column='text', rate_limit_delay=1.0 # 1 second between requests ) ``` -------------------------------- ### Define Pydantic Model for Sentiment Analysis Source: https://github.com/bdcdo/dataframeit/blob/main/docs/en/getting-started/quickstart.md Defines a Pydantic model named 'Sentiment' with specific literal values for sentiment and confidence. This model ensures that the LLM returns valid, predefined categories for sentiment analysis. ```python from pydantic import BaseModel, Field from typing import Literal class Sentiment(BaseModel): """Sentiment analysis of a text.""" sentiment: Literal['positive', 'negative', 'neutral'] = Field( description="Overall sentiment of the text" ) confidence: Literal['high', 'medium', 'low'] = Field( description="Confidence level in the classification" ) ``` -------------------------------- ### Configuring LLM Providers in DataFrameIt Source: https://github.com/bdcdo/dataframeit/blob/main/docs/en/reference/llm-reference.md Shows how to specify different LLM providers (Google Gemini, OpenAI, Anthropic) when using the DataFrameIt function. It demonstrates passing the `provider` argument and optionally specifying a particular `model` and `model_kwargs` for customization. ```python # Google Gemini (default) result = dataframeit(df, Model, PROMPT, text_column='text') # OpenAI result = dataframeit( df, Model, PROMPT, text_column='text', provider='openai', model='gpt-5.2-mini' ) # Anthropic result = dataframeit( df, Model, PROMPT, text_column='text', provider='anthropic', model='claude-sonnet-4.5' ) # With extra parameters result = dataframeit( df, Model, PROMPT, text_column='text', provider='openai', model='gpt-5.2-mini', model_kwargs={'temperature': 0.2} ) ``` -------------------------------- ### DataFrameIt LLM Provider Configuration Source: https://context7.com/bdcdo/dataframeit/llms.txt Demonstrates how to configure and use different LLM providers (OpenAI, Anthropic, etc.) with DataFrameIt by specifying the `provider` and `model` parameters. Ensure the corresponding API keys are set as environment variables. ```APIDOC ## dataframeit() - With Different LLM Providers ### Description Allows specifying different LLM providers and models for text processing. Requires setting the appropriate API key as an environment variable for the chosen provider. ### Method `dataframeit` function call ### Parameters #### Core Parameters - **df** (pandas.DataFrame or polars.DataFrame or list or dict) - Input data. - **output_model** (pydantic.BaseModel) - Pydantic model for output structure. - **prompt_template** (str) - Prompt template. #### LLM Provider Configuration - **provider** (str) - Name of the LLM provider (e.g., 'google_genai', 'openai', 'anthropic'). - **model** (str) - Name of the specific model to use (e.g., 'gemini-3.0-flash', 'gpt-5.2-mini', 'claude-sonnet-4.5'). - **model_kwargs** (dict) - Additional parameters to pass to the LLM model (e.g., `{'temperature': 0.2, 'max_tokens': 1000}`). ### Environment Variables - `GOOGLE_API_KEY`: For Google Gemini. - `OPENAI_API_KEY`: For OpenAI models. - `ANTHROPIC_API_KEY`: For Anthropic Claude models. ### Request Example (OpenAI) ```python from pydantic import BaseModel from typing import Literal import pandas as pd from dataframeit import dataframeit class Sentiment(BaseModel): sentiment: Literal['positive', 'negative', 'neutral'] df = pd.DataFrame({'texto': ['Great service!', 'Terrible experience.']}) # export OPENAI_API_KEY="your-key" result = dataframeit( df, Sentiment, "Analyze sentiment: {texto}", provider='openai', model='gpt-5.2-mini' ) # With custom model parameters result = dataframeit( df, Sentiment, "Analyze sentiment: {texto}", provider='openai', model='gpt-5.2', model_kwargs={ 'temperature': 0.2, 'max_tokens': 1000 } ) ``` ### Request Example (Anthropic Claude) ```python from pydantic import BaseModel from typing import Literal import pandas as pd from dataframeit import dataframeit class Sentiment(BaseModel): sentiment: Literal['positive', 'negative', 'neutral'] df = pd.DataFrame({'texto': ['Great service!', 'Terrible experience.']}) # export ANTHROPIC_API_KEY="your-key" result = dataframeit( df, Sentiment, "Analyze sentiment: {texto}", provider='anthropic', model='claude-sonnet-4.5' ) ``` ### Response #### Success Response (200) Returns the input DataFrame enriched with the extracted fields according to the specified Pydantic model. ``` -------------------------------- ### Use Different LLM Providers with DataFrameIt (Python) Source: https://context7.com/bdcdo/dataframeit/llms.txt Shows how to configure and use various LLM providers (Google Gemini, OpenAI, Anthropic) with the `dataframeit()` function. It highlights setting the `provider` and `model` parameters and passing custom model arguments. Ensure the respective API keys are set as environment variables. ```python from pydantic import BaseModel from typing import Literal import pandas as pd from dataframeit import dataframeit class Sentiment(BaseModel): sentiment: Literal['positive', 'negative', 'neutral'] df = pd.DataFrame({'texto': ['Great service!', 'Terrible experience.']}) # Google Gemini (default) # export GOOGLE_API_KEY="your-key" result = dataframeit( df, Sentiment, "Analyze sentiment: {texto}", provider='google_genai', model='gemini-3.0-flash' ) # OpenAI # export OPENAI_API_KEY="your-key" result = dataframeit( df, Sentiment, "Analyze sentiment: {texto}", provider='openai', model='gpt-5.2-mini' ) # Anthropic Claude # export ANTHROPIC_API_KEY="your-key" result = dataframeit( df, Sentiment, "Analyze sentiment: {texto}", provider='anthropic', model='claude-sonnet-4.5' ) # With custom model parameters result = dataframeit( df, Sentiment, "Analyze sentiment: {texto}", provider='openai', model='gpt-5.2', model_kwargs={ 'temperature': 0.2, 'max_tokens': 1000 } ) ``` -------------------------------- ### Using Different Providers with DataFrameIt (Python) Source: https://github.com/bdcdo/dataframeit/blob/main/docs/en/guides/basic-usage.md Shows how to configure DataFrameIt to use different AI model providers like Google Gemini (default), OpenAI, and Anthropic Claude. Specify the provider and model name to switch between them. Requires dataframeit. ```python # Google Gemini (default) result = dataframeit(df, Model, PROMPT, text_column='text') ``` ```python # OpenAI result = dataframeit( df, Model, PROMPT, text_column='text', provider='openai', model='gpt-5.2-mini' ) ``` ```python # Anthropic Claude result = dataframeit( df, Model, PROMPT, text_column='text', provider='anthropic', model='claude-sonnet-4.5' ) ``` -------------------------------- ### Load DataFrame with Different Normalization Options Source: https://context7.com/bdcdo/dataframeit/llms.txt Demonstrates loading data from Excel files with automatic normalization (JSON strings to Python objects), precise normalization using a Pydantic model, or no normalization at all. Also shows supported file formats and passing additional pandas arguments. ```python from dataframeit import read_df from pydantic import BaseModel from typing import List, Dict class Extraction(BaseModel): keywords: List[str] title: str # Load with automatic normalization (JSON strings -> Python objects) df_loaded = read_df('results.xlsx') print(type(df_loaded['keywords'].iloc[0])) # # Load with Pydantic model for precise normalization df_loaded = read_df('results.xlsx', model=Extraction) # Load without normalization df_raw = read_df('results.xlsx', normalize=False) print(type(df_raw['keywords'].iloc[0])) # # Supported formats df = read_df('data.xlsx') # Excel df = read_df('data.csv') # CSV df = read_df('data.parquet') # Parquet df = read_df('data.json') # JSON # Pass additional pandas arguments df = read_df('data.csv', encoding='utf-8', sep=';') df = read_df('data.xlsx', sheet_name='Sheet2') ``` -------------------------------- ### Incremental Processing with DataFrame-it in Python Source: https://github.com/bdcdo/dataframeit/blob/main/docs/en/reference/llm-reference.md Demonstrates incremental processing using the DataFrame-it library. It shows how to save intermediate results to an Excel file and then load and continue processing from that saved state. ```python # Process and save result = dataframeit(df, Model, PROMPT, text_column='text', resume=True) result.to_excel('partial.xlsx', index=False) # Load and continue df = pd.read_excel('partial.xlsx') result = dataframeit(df, Model, PROMPT, text_column='text', resume=True) ``` -------------------------------- ### Process DataFrame with LLM and Pydantic Model (Python) Source: https://context7.com/bdcdo/dataframeit/llms.txt Demonstrates the core functionality of `dataframeit()` to process a pandas DataFrame using a Pydantic model for structured output. It takes input data, a Pydantic model, and a prompt template to extract and append structured information. The default LLM provider is Google Gemini. ```python from pydantic import BaseModel, Field from typing import Literal, List import pandas as pd from dataframeit import dataframeit # Define output structure with Pydantic class ProductReview(BaseModel): sentiment: Literal['positive', 'negative', 'neutral'] confidence: Literal['high', 'medium', 'low'] keywords: List[str] = Field(description="Key topics mentioned") summary: str = Field(description="One sentence summary") # Input data df = pd.DataFrame({ 'texto': [ 'Amazing product! Exceeded all expectations, fast shipping.', 'Terrible quality, broke after one week. Never buying again.', 'Decent product for the price. Nothing special but works fine.' ] }) # Process with default settings (Google Gemini) result = dataframeit( df, ProductReview, "Analyze the following product review: {texto}" ) # Result includes original text + extracted columns print(result[['texto', 'sentiment', 'confidence', 'keywords', 'summary']]) ``` -------------------------------- ### Prompt Templating for DataFrame-it in Python Source: https://github.com/bdcdo/dataframeit/blob/main/docs/en/reference/llm-reference.md Illustrates two methods for defining prompt templates in Python for use with DataFrame-it. The first is a simple text string appended to the input, while the second uses a placeholder '{texto}' for more control over prompt structure. ```python # Simple - text added at the end PROMPT = "Classify the sentiment of the text." # With placeholder - control the position PROMPT = """ Analyze the document below: {texto} Extract the requested information. """ ``` -------------------------------- ### DataFrameIt Performance Tuning: Parallelism and Rate Limiting Source: https://github.com/bdcdo/dataframeit/blob/main/docs/en/reference/llm-reference.md Explains how to optimize DataFrameIt's performance by enabling parallel requests and configuring rate limit delays. This helps in processing large datasets more efficiently and avoiding API rate limit errors. ```python # Parallel processing result = dataframeit( df, Model, PROMPT, text_column='text', parallel_requests=5 # 5 simultaneous workers ) # Rate limiting (prevents 429 error) result = dataframeit( df, Model, PROMPT, text_column='text', rate_limit_delay=1.0 # 1 second between requests ) # Combined result = dataframeit( df, Model, PROMPT, text_column='text', parallel_requests=5, rate_limit_delay=0.5 ) ``` -------------------------------- ### Token Tracking and Cost Calculation Source: https://github.com/bdcdo/dataframeit/blob/main/docs/en/guides/performance.md Enables `track_tokens=True` to monitor input and output token usage per row and in total. This allows for accurate cost estimation by multiplying token counts with provider-specific prices. Additional columns like `_input_tokens`, `_output_tokens`, and `_total_tokens` are added to the DataFrame. ```python result = dataframeit( df, Model, PROMPT, text_column='text', track_tokens=True ) # At the end, displays: # ============================================================ # TOKEN USAGE STATISTICS # ============================================================ # Model: gemini-3.0-flash # Total tokens: 15,432 # • Input: 12,345 tokens # • Output: 3,087 tokens # ============================================================ ``` ```python result = dataframeit(df, Model, PROMPT, text_column='text', track_tokens=True) # Example: Gemini 2.0 Flash prices price_input = 0.075 / 1_000_000 # $0.075 per 1M tokens price_output = 0.30 / 1_000_000 # $0.30 per 1M tokens cost_input = result['_input_tokens'].sum() * price_input cost_output = result['_output_tokens'].sum() * price_output total_cost = cost_input + cost_output print(f"Estimated cost: ${total_cost:.4f}") ``` -------------------------------- ### Input Types for DataFrameIt (Python) Source: https://github.com/bdcdo/dataframeit/blob/main/docs/en/guides/basic-usage.md Demonstrates using DataFrameIt with different input data structures: lists, dictionaries, and pandas Series. Each input type is processed to produce a DataFrame, maintaining or adapting the index as appropriate. Requires pandas and dataframeit. ```python # With List texts = ['Text 1', 'Text 2', 'Text 3'] result = dataframeit(texts, Sentiment, PROMPT) # Returns DataFrame with numeric index ``` ```python # With Dictionary documents = { 'doc_001': 'Content of document 1', 'doc_002': 'Content of document 2', } result = dataframeit(documents, Sentiment, PROMPT) # Returns DataFrame with keys as index ``` ```python # With Series series = pd.Series(['Text A', 'Text B'], index=['id_1', 'id_2']) result = dataframeit(series, Sentiment, PROMPT) # Preserves original index ``` -------------------------------- ### DataFrameIt Supported Input Data Types Source: https://github.com/bdcdo/dataframeit/blob/main/docs/en/reference/llm-reference.md Illustrates how to use DataFrameIt with various input data structures including pandas DataFrames, lists, dictionaries, and pandas Series. It highlights the requirement of `text_column` for DataFrames and its absence for other types. ```python # DataFrame (requires text_column) df = pd.DataFrame({'text': ['A', 'B']}) result = dataframeit(df, Model, PROMPT, text_column='text') # List (no text_column needed) texts = ['Text 1', 'Text 2'] result = dataframeit(texts, Model, PROMPT) # Dictionary (keys become index) docs = {'id1': 'Text 1', 'id2': 'Text 2'} result = dataframeit(docs, Model, PROMPT) # Series (preserves index) series = pd.Series(['A', 'B'], index=['x', 'y']) result = dataframeit(series, Model, PROMPT) ``` -------------------------------- ### Basic DataFrameIt Web Search Usage (Python) Source: https://github.com/bdcdo/dataframeit/blob/main/docs/en/guides/web-search.md Demonstrates the basic usage of DataFrameIt with web search enabled. It defines a Pydantic model for company information, prepares a Pandas DataFrame, and then calls the dataframeit function with `use_search=True` to enrich the data using web searches. ```python from pydantic import BaseModel, Field from typing import Literal import pandas as pd from dataframeit import dataframeit class CompanyInfo(BaseModel): sector: Literal['technology', 'health', 'finance', 'retail', 'other'] description: str = Field(description="Brief company description") founded: str = Field(description="Year founded, if found") # Data with company names df = pd.DataFrame({ 'text': ['Microsoft', 'Stripe', 'DoorDash'] }) PROMPT = """ Based on available information and web search, extract information about the mentioned company. """ # Enable web search with use_search=True result = dataframeit( df, CompanyInfo, PROMPT, text_column='text', use_search=True, # Enable web search max_results=5 # Number of results per search ) ``` -------------------------------- ### DataFrameIt Function Signature and Parameters Source: https://github.com/bdcdo/dataframeit/blob/main/docs/en/reference/llm-reference.md Defines the core function signature for DataFrameIt, outlining all available parameters. These include data input, Pydantic model for schema, prompt template, text column specification, LLM provider and model selection, retry mechanisms, and optional web search integration. ```python from dataframeit import dataframeit result = dataframeit( data, # DataFrame, Series, list or dict questions, # Pydantic model prompt, # Prompt template text_column='text', # Column with texts (required for DataFrame) model='gemini-3.0-flash', provider='google_genai', # 'google_genai', 'openai', 'anthropic' resume=True, # Continue from where it stopped parallel_requests=1, # Parallel workers rate_limit_delay=0.0, # Delay between requests (seconds) max_retries=3, # Retry attempts on error track_tokens=True, # Track token usage api_key=None, # API key (uses env var if None) model_kwargs=None, # Extra parameters (temperature, etc) # Web search (requires TAVILY_API_KEY) use_search=False, # Enable web search search_per_field=False, # Separate search per field max_results=5, # Results per search search_depth='basic', # 'basic' or 'advanced' ) ``` -------------------------------- ### Text Summarization with Python Source: https://github.com/bdcdo/dataframeit/blob/main/docs/en/guides/basic-usage.md Generates a concise summary, key points, and main topic from text data using a pandas DataFrame and a Pydantic model. Requires pydantic and dataframeit. The output is structured by the Summary model. ```python from pydantic import BaseModel, Field class Summary(BaseModel): summary: str = Field(description="Summary in up to 50 words") key_points: list[str] = Field(description="List of 3-5 main points") main_topic: str = Field(description="Central topic in one word") PROMPT = """ Analyze the text and extract a concise summary. Identify the main points and central topic. """ result = dataframeit(df, Summary, PROMPT, text_column='text') ``` -------------------------------- ### Economy Configuration for DataFrameIt Source: https://github.com/bdcdo/dataframeit/blob/main/docs/en/guides/performance.md Focuses on cost-effectiveness by running tasks sequentially (`parallel_requests=1`), applying a high `rate_limit_delay`, and selecting a cheaper model like `gemini-3.0-flash`. Token tracking is enabled to monitor exact usage. ```python result = dataframeit( df, Model, PROMPT, text_column='text', parallel_requests=1, # Sequential rate_limit_delay=1.5, # High delay model='gemini-3.0-flash', # Cheap model track_tokens=True ) ``` -------------------------------- ### Web Search Integration with dataframeit() Source: https://context7.com/bdcdo/dataframeit/llms.txt Enrich data by integrating real-time web search using Tavily. Configure search depth, number of results, and per-field searching for detailed information retrieval. Requires TAVILY_API_KEY. ```python from pydantic import BaseModel, Field from typing import Literal, List import pandas as pd from dataframeit import dataframeit # export TAVILY_API_KEY="your-tavily-key" class CompanyInfo(BaseModel): sector: Literal['technology', 'healthcare', 'finance', 'retail', 'other'] description: str = Field(description="Brief company description") founded: str = Field(description="Year founded, if found") headquarters: str = Field(description="Company headquarters location") df = pd.DataFrame({ 'texto': ['Microsoft', 'Nubank', 'Shopify'] }) # Basic web search result = dataframeit( df, CompanyInfo, "Research and extract information about the company: {texto}", use_search=True, # Enable Tavily web search max_results=5 # Number of search results per query ) # Advanced search with more depth result = dataframeit( df, CompanyInfo, "Research the company: {texto}", use_search=True, search_depth='advanced', # 'basic' (1 credit) or 'advanced' (2 credits) max_results=10 ) # Search per field (separate search for each Pydantic field) class DetailedInfo(BaseModel): financial_data: str = Field(description="Recent financial performance") products: List[str] = Field(description="Main products or services") competitors: List[str] = Field(description="Main competitors") result = dataframeit( df, DetailedInfo, "Research: {texto}", use_search=True, search_per_field=True # Separate search for each field ) # Search metrics tracked in output print(f"Search credits used: {result['_search_credits'].sum()}") print(f"Total searches: {result['_search_count'].sum()}") ``` -------------------------------- ### DataFrameIt Parallel Processing and Performance Source: https://context7.com/bdcdo/dataframeit/llms.txt Details on optimizing processing speed using parallel requests, rate limiting, and automatic retries with exponential backoff. Includes token tracking for cost monitoring. ```APIDOC ## dataframeit() - Parallel Processing and Performance ### Description Configure parameters to control processing speed, reliability, and cost. Enables parallel requests, automatic retries, and token usage tracking. ### Method `dataframeit` function call ### Parameters #### Core Parameters - **df** (pandas.DataFrame or polars.DataFrame or list or dict) - Input data. - **output_model** (pydantic.BaseModel) - Pydantic model for output structure. - **prompt_template** (str) - Prompt template. #### Performance and Reliability Parameters - **parallel_requests** (int) - The number of concurrent requests to the LLM API. Increases throughput for large datasets. Defaults to 1. - **track_tokens** (bool) - If True, token usage for each request will be tracked and included in the output, aiding cost monitoring. Defaults to False. - **retry_strategy** (dict) - Configuration for automatic retries on API errors, including exponential backoff. Example: `{'max_attempts': 3, 'backoff_factor': 0.5}`. ### Request Example ```python from pydantic import BaseModel from typing import Literal import pandas as pd from dataframeit import dataframeit class Analysis(BaseModel): category: Literal['tech', 'health', 'finance', 'other'] relevance: Literal['high', 'medium', 'low'] df = pd.DataFrame({'texto': [f'Text {i}' for i in range(100)]}) # Process with 5 parallel workers and token tracking enabled result = dataframeit( df, Analysis, "Categorize: {texto}", parallel_requests=5, # Process up to 5 requests concurrently track_tokens=True # Enable token usage monitoring ) print(result) ``` ### Response #### Success Response (200) Returns the input DataFrame enriched with extracted fields. If `track_tokens` is True, additional columns for prompt tokens, completion tokens, and total tokens will be included. ``` -------------------------------- ### Configure Fields with Custom Prompts in dataframeit() Source: https://context7.com/bdcdo/dataframeit/llms.txt Configure custom prompts and search parameters for individual fields using Pydantic's json_schema_extra. This requires the search_per_field=True argument. It allows for replacing or appending to default prompts and overriding search parameters like search_depth and max_results. ```python from pydantic import BaseModel, Field import pandas as pd from dataframeit import dataframeit class MedicationInfo(BaseModel): # Standard field with default behavior active_ingredient: str = Field(description="Active ingredient of the medication") # Field with completely replaced prompt rare_disease: str = Field( description="Rare disease classification", json_schema_extra={ "prompt": "Search Orphanet (orpha.net) for rare disease info. Analyze: {texto}" } ) # Field with appended prompt instructions regulatory_status: str = Field( description="FDA approval status", json_schema_extra={ "prompt_append": "Search ONLY FDA.gov for regulatory information." } ) # Field with custom search parameters clinical_trials: str = Field( description="Relevant clinical trials", json_schema_extra={ "prompt_append": "Find recent clinical trials (2020-2024).", "search_depth": "advanced", # Override search depth "max_results": 10 # Override max results } ) df = pd.DataFrame({'texto': ['Pembrolizumab', 'Trastuzumab']}) result = dataframeit( df, MedicationInfo, "Analyze the medication: {texto}", use_search=True, search_per_field=True # Required for per-field configuration ) ``` -------------------------------- ### Use a More Capable Model in DataFrameIt (Python) Source: https://github.com/bdcdo/dataframeit/blob/main/docs/en/guides/error-handling.md This code snippet illustrates how to select a more capable language model within DataFrameIt to potentially reduce errors caused by complex prompts or data. It shows how to specify a different model, like 'gemini-2.5-pro', which might offer better performance than smaller models. ```python # If errors persist, try a more capable model result = dataframeit( df, Model, PROMPT, text_column='text', model='gemini-2.5-pro' # More capable than flash ) ``` -------------------------------- ### Configure Tavily API Key Source: https://github.com/bdcdo/dataframeit/blob/main/docs/en/guides/web-search.md Set the TAVILY_API_KEY environment variable with your Tavily API key. This key is required for authentication and enabling web search operations. ```bash export TAVILY_API_KEY="your-tavily-key" ``` -------------------------------- ### Rate-Limited and Balanced Processing with dataframeit() Source: https://context7.com/bdcdo/dataframeit/llms.txt Configure rate limiting and retry mechanisms for robust API interactions. The balanced approach uses parallel requests for efficiency on large datasets, with options to track token usage. ```python from dataframeit import dataframeit # Rate-limited processing (prevents 429 errors) result = dataframeit( df, Analysis, "Categorize: {texto}", rate_limit_delay=1.0, # 1 second between requests (60 req/min) max_retries=5, # Retry failed requests up to 5 times base_delay=2.0, # Initial retry delay max_delay=30.0 # Maximum retry delay ) # Balanced approach for large datasets result = dataframeit( df, Analysis, "Categorize: {texto}", parallel_requests=3, rate_limit_delay=0.5, track_tokens=True ) ```