# Pydantic AI Pydantic AI is a Python agent framework designed to make building production-grade GenAI applications feel as natural as building web APIs with FastAPI. Built by the Pydantic team, it leverages Pydantic's validation capabilities to provide type-safe, testable, and observable AI agents that work with virtually any LLM provider. The framework emphasizes ergonomic design, full type safety, and seamless integration with modern Python development practices. The framework provides a unified interface for interacting with multiple model providers including OpenAI, Anthropic, Gemini, Groq, Mistral, Cohere, Bedrock, DeepSeek, Grok, and Cerebras, along with platforms like Azure AI Foundry, Amazon Bedrock, Google Vertex AI, Ollama, LiteLLM, OpenRouter, Together AI, Fireworks AI, Hugging Face, GitHub, Heroku, Vercel, Nebius, OVHcloud, and Outlines. It offers built-in observability through Pydantic Logfire, powerful evaluation capabilities, and support for advanced patterns like streaming, tool calling, structured outputs, human-in-the-loop workflows, durable execution, and multi-agent communication via A2A and MCP protocols. Pydantic AI is designed for reusability — agents are instantiated once and reused throughout your application, similar to FastAPI apps. ## Core APIs and Functions ### Agent Creation Create an agent with model specification and configuration. ```python from pydantic_ai import Agent # Simple agent with text output agent = Agent( 'openai:gpt-4o', instructions='Be concise, reply with one sentence.' ) result = agent.run_sync('Where does "hello world" come from?') print(result.output) # Output: The first known use of "hello, world" was in a 1974 textbook about the C programming language. ``` ### Structured Output with Pydantic Models Define structured output types to validate and type agent responses. ```python from pydantic import BaseModel, Field from pydantic_ai import Agent class CityLocation(BaseModel): city: str country: str agent = Agent('google-gla:gemini-1.5-flash', output_type=CityLocation) result = agent.run_sync('Where were the olympics held in 2012?') print(result.output) # Output: city='London' country='United Kingdom' print(result.usage()) # Output: RunUsage(input_tokens=57, output_tokens=8, requests=1) ``` ### Dependency Injection System Type-safe dependency injection for tools, instructions, and output validators. ```python from dataclasses import dataclass from pydantic import BaseModel, Field from pydantic_ai import Agent, RunContext class DatabaseConn: """Fake database for example purposes.""" @classmethod async def customer_name(cls, *, id: int) -> str | None: if id == 123: return 'John' @classmethod async def customer_balance(cls, *, id: int, include_pending: bool) -> float: if id == 123: return 123.45 if include_pending else 100.00 raise ValueError('Customer not found') @dataclass class SupportDependencies: customer_id: int db: DatabaseConn class SupportOutput(BaseModel): support_advice: str = Field(description='Advice returned to the customer') block_card: bool = Field(description="Whether to block the customer's card") risk: int = Field(description='Risk level of query', ge=0, le=10) support_agent = Agent( 'openai:gpt-4o', deps_type=SupportDependencies, output_type=SupportOutput, instructions='You are a support agent in our bank, give the customer support and judge the risk level of their query.' ) @support_agent.instructions async def add_customer_name(ctx: RunContext[SupportDependencies]) -> str: customer_name = await ctx.deps.db.customer_name(id=ctx.deps.customer_id) return f"The customer's name is {customer_name!r}" @support_agent.tool async def customer_balance(ctx: RunContext[SupportDependencies], include_pending: bool) -> str: """Returns the customer's current account balance.""" balance = await ctx.deps.db.customer_balance( id=ctx.deps.customer_id, include_pending=include_pending ) return f'${balance:.2f}' # Run the agent deps = SupportDependencies(customer_id=123, db=DatabaseConn()) result = support_agent.run_sync('What is my balance?', deps=deps) print(result.output) # Output: support_advice='Hello John, your current account balance, including pending transactions, is $123.45.' block_card=False risk=1 result = support_agent.run_sync('I just lost my card!', deps=deps) print(result.output) # Output: support_advice="I'm sorry to hear that, John. We are temporarily blocking your card to prevent unauthorized transactions." block_card=True risk=8 ``` ### Function Tools with @agent.tool Decorator Register functions that the LLM can call to retrieve information. ```python import random from pydantic_ai import Agent, RunContext agent = Agent( 'google-gla:gemini-1.5-flash', deps_type=str, system_prompt=( "You're a dice game, you should roll the die and see if the number " "you get back matches the user's guess. If so, tell them they're a winner. " "Use the player's name in the response." ) ) @agent.tool_plain # Tool without context def roll_dice() -> str: """Roll a six-sided die and return the result.""" return str(random.randint(1, 6)) @agent.tool # Tool with context access def get_player_name(ctx: RunContext[str]) -> str: """Get the player's name.""" return ctx.deps dice_result = agent.run_sync('My guess is 4', deps='Anne') print(dice_result.output) # Output: Congratulations Anne, you guessed correctly! You're a winner! ``` ### Multi-Tool Weather Agent Agents can orchestrate multiple tool calls to answer complex queries. ```python import asyncio from dataclasses import dataclass from typing import Any from httpx import AsyncClient from pydantic import BaseModel from pydantic_ai import Agent, RunContext @dataclass class Deps: client: AsyncClient weather_agent = Agent( 'openai:gpt-4.1-mini', instructions='Be concise, reply with one sentence.', deps_type=Deps, retries=2 ) class LatLng(BaseModel): lat: float lng: float @weather_agent.tool async def get_lat_lng(ctx: RunContext[Deps], location_description: str) -> LatLng: """Get the latitude and longitude of a location. Args: ctx: The context. location_description: A description of a location. """ r = await ctx.deps.client.get( 'https://demo-endpoints.pydantic.workers.dev/latlng', params={'location': location_description} ) r.raise_for_status() return LatLng.model_validate_json(r.content) @weather_agent.tool async def get_weather(ctx: RunContext[Deps], lat: float, lng: float) -> dict[str, Any]: """Get the weather at a location. Args: ctx: The context. lat: Latitude of the location. lng: Longitude of the location. """ temp_response, descr_response = await asyncio.gather( ctx.deps.client.get( 'https://demo-endpoints.pydantic.workers.dev/number', params={'min': 10, 'max': 30} ), ctx.deps.client.get( 'https://demo-endpoints.pydantic.workers.dev/weather', params={'lat': lat, 'lng': lng} ) ) temp_response.raise_for_status() descr_response.raise_for_status() return { 'temperature': f'{temp_response.text} °C', 'description': descr_response.text } async def main(): async with AsyncClient() as client: deps = Deps(client=client) result = await weather_agent.run( 'What is the weather like in London and in Wiltshire?', deps=deps ) print('Response:', result.output) # Run: asyncio.run(main()) ``` ### Streaming Output Stream text responses in real-time with immediate validation. ```python from pydantic_ai import Agent agent = Agent('openai:gpt-4o') # Stream text output async def stream_example(): async with agent.run_stream('What is the capital of the UK?') as response: async for text in response.stream_text(): print(text) # Output streams: # The capital of # The capital of the UK is # The capital of the UK is London. # Run: asyncio.run(stream_example()) ``` ### Iterating Through Agent Run Nodes Iterate through the agent's execution graph nodes as they execute. ```python from pydantic_ai import Agent agent = Agent('openai:gpt-4o') async def iterate_example(): nodes = [] # Iterate through the run, recording each node along the way async with agent.iter('What is the capital of France?') as agent_run: async for node in agent_run: nodes.append(node) print(f"Executed node: {node}") # Access the final result print(f"Final result: {agent_run.result.output}") # Run: asyncio.run(iterate_example()) ``` ### Multiple Output Types with Union Return either structured data or plain text based on model decision. ```python from pydantic import BaseModel from pydantic_ai import Agent class Box(BaseModel): width: int height: int depth: int units: str agent = Agent( 'openai:gpt-4o-mini', output_type=[Box, str], system_prompt=( "Extract me the dimensions of a box, " "if you can't extract all data, ask the user to try again." ) ) result = agent.run_sync('The box is 10x20x30') print(result.output) # Output: Please provide the units for the dimensions (e.g., cm, in, m). result = agent.run_sync('The box is 10x20x30 cm') print(result.output) # Output: width=10 height=20 depth=30 units='cm' ``` ### Message History and Conversation Context Access and continue conversations using message history. ```python from pydantic_ai import Agent agent = Agent('openai:gpt-4o', system_prompt='Be a helpful assistant.') result = agent.run_sync('Tell me a joke.') print(result.output) # Output: Did you hear about the toothpaste scandal? They called it Colgate. # Access message history messages = result.all_messages() print(messages) # Output: [ModelRequest(...), ModelResponse(...)] # Continue conversation with history result2 = agent.run_sync( 'Can you explain it?', message_history=result.new_messages() ) ``` ### RAG Pattern with Vector Search Implement retrieval-augmented generation with tool-based document retrieval. ```python import asyncio from dataclasses import dataclass import asyncpg from openai import AsyncOpenAI from pydantic_ai import Agent, RunContext @dataclass class Deps: openai: AsyncOpenAI pool: asyncpg.Pool agent = Agent('openai:gpt-4o', deps_type=Deps) @agent.tool async def retrieve(context: RunContext[Deps], search_query: str) -> str: """Retrieve documentation sections based on a search query. Args: context: The call context. search_query: The search query. """ # Create embedding for search query embedding = await context.deps.openai.embeddings.create( input=search_query, model='text-embedding-3-small' ) embedding_data = embedding.data[0].embedding # Query vector database for similar documents rows = await context.deps.pool.fetch( 'SELECT url, title, content FROM doc_sections ORDER BY embedding <-> $1 LIMIT 8', embedding_data ) return '\n\n'.join( f'# {row["title"]}\nDocumentation URL:{row["url"]}\n\n{row["content"]}\n' for row in rows ) async def run_agent(question: str): """Entry point to run the agent and perform RAG based question answering.""" openai = AsyncOpenAI() async with database_connect() as pool: deps = Deps(openai=openai, pool=pool) answer = await agent.run(question, deps=deps) print(answer.output) # Run: asyncio.run(run_agent("How do I configure logfire to work with FastAPI?")) ``` ### Unit Testing with TestModel Test agents without making real LLM API calls. ```python import pytest from pydantic_ai import Agent, RunContext from pydantic_ai.models.test import TestModel from pydantic_ai import models weather_agent = Agent( 'openai:gpt-4o', deps_type=str, system_prompt='Providing a weather forecast at the locations the user provides.' ) @weather_agent.tool def weather_forecast(ctx: RunContext[str], location: str) -> str: """Get weather forecast for a location.""" return f'The weather in {location} is sunny.' @pytest.mark.asyncio async def test_weather_agent(): # Block all real model requests globally models.ALLOW_MODEL_REQUESTS = False # Override agent with TestModel with weather_agent.override(model=TestModel()): result = await weather_agent.run('What is the weather in London?', deps='test') assert 'London' in result.output or 'weather' in result.output.lower() assert result.usage().requests == 1 ``` ### Multiple Model Providers Use different LLM providers with consistent API. ```python from pydantic_ai import Agent # OpenAI openai_agent = Agent('openai:gpt-4o') # Anthropic anthropic_agent = Agent('anthropic:claude-3-5-sonnet-latest') # Google Gemini gemini_agent = Agent('google-gla:gemini-1.5-flash') # Groq groq_agent = Agent('groq:llama-3.3-70b-versatile') # DeepSeek deepseek_agent = Agent('deepseek:deepseek-chat') # Grok grok_agent = Agent('grok:grok-2-latest') # Ollama (local models) ollama_agent = Agent('ollama:llama3') # LiteLLM (unified interface) litellm_agent = Agent('litellm:gpt-4') # OpenRouter openrouter_agent = Agent('openrouter:anthropic/claude-3.5-sonnet') # Together AI together_agent = Agent('together:meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo') # All agents share the same interface result = openai_agent.run_sync('Hello!') result = anthropic_agent.run_sync('Hello!') result = gemini_agent.run_sync('Hello!') result = groq_agent.run_sync('Hello!') ``` ### Fallback Model for Reliability Automatically fallback to alternative models on failure. ```python from pydantic_ai import Agent from pydantic_ai.models.anthropic import AnthropicModel from pydantic_ai.models.fallback import FallbackModel from pydantic_ai.models.openai import OpenAIChatModel openai_model = OpenAIChatModel('gpt-4o') anthropic_model = AnthropicModel('claude-3-5-sonnet-latest') fallback_model = FallbackModel(openai_model, anthropic_model) agent = Agent(fallback_model) response = agent.run_sync('What is the capital of France?') print(response.output) # Output: Paris ``` ### Streaming Events with Event Handler Monitor agent execution with streaming events. ```python import asyncio from collections.abc import AsyncIterable from pydantic_ai import ( Agent, AgentStreamEvent, RunContext, PartStartEvent, PartDeltaEvent, TextPartDelta, FunctionToolCallEvent, FunctionToolResultEvent ) weather_agent = Agent( 'openai:gpt-4o', system_prompt='Providing a weather forecast at the locations the user provides.' ) @weather_agent.tool async def weather_forecast(ctx: RunContext, location: str) -> str: """Get weather forecast.""" return f'The forecast in {location} is 24°C and sunny.' output_messages: list[str] = [] async def handle_event(event: AgentStreamEvent): if isinstance(event, PartStartEvent): output_messages.append(f'Starting part {event.index}') elif isinstance(event, PartDeltaEvent): if isinstance(event.delta, TextPartDelta): output_messages.append(f'Text delta: {event.delta.content_delta!r}') elif isinstance(event, FunctionToolCallEvent): output_messages.append(f'Tool call: {event.part.tool_name} with {event.part.args}') elif isinstance(event, FunctionToolResultEvent): output_messages.append(f'Tool result: {event.result.content}') async def event_stream_handler(ctx: RunContext, event_stream: AsyncIterable[AgentStreamEvent]): async for event in event_stream: await handle_event(event) async def main(): async with weather_agent.run_stream( 'What will the weather be like in Paris?', event_stream_handler=event_stream_handler ) as run: async for output in run.stream_text(): print(output) # Run: asyncio.run(main()) ``` ### Builtin Tools Use pre-built tools for common agent capabilities. ```python from pydantic_ai import Agent from pydantic_ai import WebSearchTool, MemoryTool, UrlContextTool, CodeExecutionTool, ImageGenerationTool # Web search tool web_search = WebSearchTool( search_context_size='medium', max_uses=5, blocked_domains=['example.com'] ) # Memory tool for persistent storage memory = MemoryTool(memory_key='user_session_123') # URL context extraction url_context = UrlContextTool() # Code execution (sandboxed) code_exec = CodeExecutionTool() # Image generation tool image_gen = ImageGenerationTool( background='transparent', input_fidelity='high' ) agent = Agent( 'openai:gpt-4o', builtin_tools=[web_search, memory, url_context, image_gen] ) result = agent.run_sync('Search for the latest Python 3.13 features and create a diagram') ``` ### Model Context Protocol (MCP) Integration Connect to MCP servers for external tool access. ```python from pydantic_ai import Agent, MCPServerTool from pydantic_ai.mcp import MCPServerStdio # Create MCP server connection mcp_server = MCPServerStdio( 'my-mcp-server', command='uvx', args=['mcp-server-filesystem', '/path/to/data'] ) # Use MCP server tools in agent mcp_tool = MCPServerTool(mcp_server) agent = Agent( 'anthropic:claude-3-5-sonnet-latest', builtin_tools=[mcp_tool], instructions='You can access the filesystem via MCP tools.' ) result = agent.run_sync('List the files in the data directory') ``` ### Human-in-the-Loop Tool Approval Require approval for sensitive tool calls. ```python from pydantic_ai import Agent, RunContext, ApprovalRequired from pydantic_ai import DeferredToolRequests, DeferredToolResults, ToolApproved, ToolDenied agent = Agent('openai:gpt-4o') @agent.tool(requires_approval=True) async def delete_user(ctx: RunContext, user_id: int) -> str: """Delete a user from the system.""" return f'User {user_id} deleted successfully' # First run - agent requests tool call result = agent.run_sync('Delete user 123') # Result contains deferred tool request # Review and approve/deny if result.output.tool_requests: for request in result.output.tool_requests: # Check if safe to approve if safe_to_delete(request.args['user_id']): approval = ToolApproved(request.id) else: approval = ToolDenied(request.id, reason='User is admin') # Continue with approval final_result = agent.run_sync( message_history=result.all_messages(), tool_results=DeferredToolResults([approval]) ) ``` ### Durable Execution with Prefect Build agents that survive failures and restarts. ```python from pydantic_ai import Agent from pydantic_ai.durable_exec.prefect import prefect_agent agent = Agent('openai:gpt-4o') @agent.tool async def long_running_task(ctx, data: str) -> str: """A task that might fail or take a long time.""" result = await process_data(data) return result # Wrap agent with Prefect for durability durable_agent = prefect_agent(agent, flow_name='my_durable_flow') # Runs will be persisted and can recover from failures result = await durable_agent.run('Process important data') ``` ### Agent-to-Agent Communication (A2A) Enable multi-agent workflows with the A2A protocol. ```python from pydantic_ai import Agent # Create specialized agents research_agent = Agent('openai:gpt-4o', instructions='Research topics thoroughly') writer_agent = Agent('anthropic:claude-3-5-sonnet', instructions='Write clear content') # Note: A2A support requires the fasta2a package # Install with: pip install "pydantic-ai-slim[a2a]" # Convert agents to A2A services using the internal API # For production use, consult the official documentation for the current A2A integration pattern # Coordinator agent can call other agents as tools coordinator = Agent('openai:gpt-4o') @coordinator.tool async def research_topic(ctx, topic: str) -> str: """Research a topic using the research agent.""" result = await research_agent.run(topic) return result.output @coordinator.tool async def write_article(ctx, research_data: str) -> str: """Write an article using the writer agent.""" result = await writer_agent.run(f'Write article about: {research_data}') return result.output # Orchestrate multi-agent workflow result = await coordinator.run('Research and write about quantum computing') ``` ### AG-UI Interactive Applications Build streaming interactive UIs with the AG-UI protocol. ```python from pydantic_ai import Agent from pydantic_ai.ag_ui import AGUIApp, handle_ag_ui_request agent = Agent('openai:gpt-4o') @agent.tool async def fetch_data(ctx, query: str) -> str: """Fetch data for the query.""" return await database.query(query) # Create AG-UI app app = AGUIApp(agent) # The app provides endpoints that automatically stream events to UI: # - Text deltas for real-time response # - Tool calls with arguments # - Thinking parts for reasoning visibility # - Final results # Run as ASGI app # uvicorn myapp:app --host 0.0.0.0 --port 8000 ``` ### Output Validators Validate and retry agent outputs. ```python from pydantic_ai import Agent, RunContext, ModelRetry from pydantic import BaseModel class ProductRecommendation(BaseModel): product_name: str price: float reasoning: str agent = Agent('openai:gpt-4o', output_type=ProductRecommendation) @agent.output_validator async def validate_price(ctx: RunContext, output: ProductRecommendation) -> ProductRecommendation: """Ensure price is reasonable.""" if output.price < 0: raise ModelRetry('Price cannot be negative, please provide a valid price.') if output.price > 10000: raise ModelRetry('Price seems too high, please double-check.') return output result = agent.run_sync('Recommend a laptop') # Output will be validated and retried if needed ``` ### Dynamic System Prompts Use dynamic system prompts that re-evaluate on each run. ```python from pydantic_ai import Agent, RunContext from datetime import datetime agent = Agent('openai:gpt-4o') @agent.system_prompt(dynamic=True) async def time_aware_prompt(ctx: RunContext) -> str: """System prompt that includes current time.""" current_time = datetime.now().strftime('%Y-%m-%d %H:%M') return f'Current time is {current_time}. Use this in your responses when relevant.' # Prompt is re-evaluated on each run result = agent.run_sync('What time is it?') ``` ### Structured Dict Output Use untyped dictionaries with JSON schema validation. ```python from pydantic_ai import Agent, StructuredDict # Define schema without creating a Pydantic model schema = { 'name': str, 'age': int, 'email': str, 'hobbies': list[str] } agent = Agent('openai:gpt-4o', output_type=StructuredDict(schema)) result = agent.run_sync('Tell me about a fictional person named Alice') print(result.output) # Output: {'name': 'Alice', 'age': 28, 'email': 'alice@example.com', 'hobbies': ['reading', 'hiking']} ``` ### ToolOutput and NativeOutput Modes Control how structured outputs are generated. ```python from pydantic import BaseModel from pydantic_ai import Agent, ToolOutput, NativeOutput class Response(BaseModel): answer: str confidence: float # ToolOutput: Model calls a tool to return structured data tool_agent = Agent('openai:gpt-4o', output_type=ToolOutput(Response)) # NativeOutput: Model uses native structured output (faster, more reliable) native_agent = Agent('openai:gpt-4o', output_type=NativeOutput(Response)) # Both produce the same typed output result1 = tool_agent.run_sync('What is 2+2?') result2 = native_agent.run_sync('What is 2+2?') ``` ### End Strategy Control Control tool execution strategy. ```python from pydantic_ai import Agent, EndStrategy # Early: Stop at first successful response early_agent = Agent('openai:gpt-4o', end_strategy='early') # Exhaustive: Execute all requested tool calls before responding exhaustive_agent = Agent('openai:gpt-4o', end_strategy='exhaustive') @exhaustive_agent.tool async def task_a(ctx) -> str: return 'Task A complete' @exhaustive_agent.tool async def task_b(ctx) -> str: return 'Task B complete' # With exhaustive, both tools will be called even if model could respond earlier result = exhaustive_agent.run_sync('Run task A and task B') ``` ### Advanced Toolset Management Use specialized toolsets for flexible tool organization. ```python from pydantic_ai import Agent from pydantic_ai.toolsets import ( CombinedToolset, FilteredToolset, PrefixedToolset, RenamedToolset, ApprovalRequiredToolset ) # Combine multiple toolsets combined = CombinedToolset([toolset1, toolset2]) # Filter tools by name pattern filtered = FilteredToolset(original_toolset, include=['get_*', 'fetch_*']) # Add prefix to all tool names prefixed = PrefixedToolset(original_toolset, prefix='admin_') # Rename specific tools renamed = RenamedToolset(original_toolset, {'old_name': 'new_name'}) # Require approval for all tools in a toolset approval_required = ApprovalRequiredToolset(original_toolset) agent = Agent('openai:gpt-4o', toolsets=[combined, filtered]) ``` ## Summary Pydantic AI brings the FastAPI development experience to GenAI applications through type-safe agents, dependency injection, structured outputs, and comprehensive testing support. The framework's main use cases include building customer support agents with database integration, implementing RAG systems with vector search, creating multi-tool workflows that orchestrate API calls, developing production-ready AI applications with observability and error handling, building durable agents that handle long-running workflows, and creating multi-agent systems with A2A protocol support. Agents are designed as reusable components that can be instantiated once and used throughout an application. The framework excels at integration patterns through its dependency injection system, which cleanly separates business logic from AI orchestration, and its unified model interface, which allows switching between providers without code changes. Advanced features include builtin tools for web search, code execution, memory, image generation, and MCP server integration; human-in-the-loop workflows with tool approval; durable execution via Prefect, Temporal, and DBOS for fault-tolerant agents; AG-UI protocol for interactive streaming applications; multi-agent coordination via A2A protocol; and flexible toolset management with filtering, prefixing, and approval requirements. With support for over 25 providers including local models via Ollama, unified access through LiteLLM, and routing via OpenRouter, developers can choose the best model for their needs without vendor lock-in. Streaming support enables real-time user experiences with both text and structured output, while the `AgentRun` API provides fine-grained control over agent execution by allowing iteration through graph nodes. Testing utilities (TestModel, FunctionModel, and agent overrides) make it straightforward to write comprehensive unit tests without hitting real LLM APIs. The result is a framework that makes it easy to build, test, deploy, and maintain production-grade AI agents with the confidence and ergonomics of modern Python development.