Try Live
Add Docs
Rankings
Pricing
Docs
Install
Theme
Install
Docs
Pricing
More...
More...
Try Live
Rankings
Enterprise
Create API Key
Add Docs
Tavily Python
https://github.com/tavily-ai/tavily-python
Admin
A python wrapper for Tavily search API
Tokens:
8,660
Snippets:
52
Trust Score:
8.4
Update:
3 months ago
Context
Skills
Chat
Benchmark
85.5
Suggestions
Latest
Show doc for...
Code
Info
Show Results
Context Summary (auto-generated)
Raw
Copy
Link
# Tavily Python Wrapper Tavily is a Python wrapper library that provides seamless integration with the Tavily API, enabling developers to perform intelligent web searches, extract content from URLs, crawl websites, and build Retrieval-Augmented Generation (RAG) applications. The library offers both synchronous and asynchronous interfaces for maximum flexibility in different application architectures. This wrapper simplifies access to Tavily's powerful search capabilities including basic, advanced, fast, and ultra-fast search modes, specialized topic searches (general, news, finance), content extraction from multiple URLs simultaneously, website crawling and mapping, and comprehensive research report generation. It includes built-in error handling, token management for LLM contexts, proxy support, and integration capabilities with popular frameworks like OpenAI, LangChain, and MongoDB for hybrid search scenarios. ## API Reference and Code Examples ### Basic Web Search Perform a comprehensive web search with customizable parameters and retrieve structured results including URLs, titles, content snippets, and relevance scores. ```python from tavily import TavilyClient # Initialize client with API key tavily_client = TavilyClient(api_key="tvly-YOUR_API_KEY") # Basic search response = tavily_client.search("Who is Leo Messi?") # Advanced search with all parameters response = tavily_client.search( query="artificial intelligence latest developments", search_depth="advanced", # "basic", "advanced", "fast", or "ultra-fast" topic="general", # "general", "news", or "finance" max_results=10, include_answer=True, # Can also be "basic" or "advanced" include_raw_content="markdown", # "markdown" or "text" include_images=True, time_range="week", # "day", "week", "month", "year" start_date="2024-01-01", # Alternative to time_range end_date="2024-12-31", # Used with start_date include_domains=["arxiv.org", "github.com"], exclude_domains=["medium.com"], country="US", auto_parameters=True, # Automatically optimize search parameters include_favicon=True, # Include favicon URLs in results include_usage=True, # Include API usage information timeout=60 ) # Response structure # { # "query": "Who is Leo Messi?", # "answer": "Lionel Messi is an Argentine professional footballer...", # "results": [ # { # "title": "Lionel Messi - Wikipedia", # "url": "https://en.wikipedia.org/wiki/Lionel_Messi", # "content": "Lionel Andrés Messi is an Argentine...", # "score": 0.98, # "raw_content": "# Lionel Messi\n\nLionel Andrés Messi...", # "images": ["https://example.com/messi.jpg"], # "favicon": "https://en.wikipedia.org/favicon.ico" # } # ], # "images": ["https://example.com/image1.jpg"], # "usage": {"credits_used": 5} # } print(f"Answer: {response['answer']}") for result in response['results']: print(f"Title: {result['title']}") print(f"URL: {result['url']}") print(f"Score: {result['score']}") print(f"Content: {result['content'][:200]}...\n") ``` ### Content Extraction from URLs Extract raw content from multiple URLs simultaneously (up to 20) with optional image extraction, format control, and query-based chunking. ```python from tavily import TavilyClient tavily_client = TavilyClient(api_key="tvly-YOUR_API_KEY") # Extract content from multiple URLs urls = [ "https://en.wikipedia.org/wiki/Artificial_intelligence", "https://en.wikipedia.org/wiki/Machine_learning", "https://en.wikipedia.org/wiki/Data_science", "https://en.wikipedia.org/wiki/Quantum_computing", "https://en.wikipedia.org/wiki/Climate_change" ] response = tavily_client.extract( urls=urls, include_images=True, extract_depth="advanced", # "basic" or "advanced" format="markdown", # "markdown" or "text" query="machine learning algorithms", # Optional: extract query-relevant chunks chunks_per_source=5, # Number of relevant chunks per URL include_favicon=True, include_usage=True, timeout=30 ) # Process successful extractions for result in response["results"]: print(f"URL: {result['url']}") print(f"Content Length: {len(result['raw_content'])} chars") print(f"Images Found: {len(result.get('images', []))}") print(f"Favicon: {result.get('favicon', 'N/A')}") print(f"Content Preview: {result['raw_content'][:200]}...") print() # Handle failed extractions if response["failed_results"]: print("Failed URLs:") for failed in response["failed_results"]: print(f"- {failed['url']}: {failed.get('error', 'Unknown error')}") ``` ### Research Report Generation Create comprehensive research reports on any topic with automatic source gathering, analysis, structured output, and customizable citation formats. ```python from tavily import TavilyClient import time tavily_client = TavilyClient(api_key="tvly-YOUR_API_KEY") # Create a research task response = tavily_client.research( input="Research the latest developments in quantum computing", model="pro", # "mini", "pro", or "auto" citation_format="apa", # "numbered", "mla", "apa", or "chicago" timeout=300 ) print(f"Research Request ID: {response['request_id']}") print(f"Status: {response['status']}") # Poll for results request_id = response['request_id'] while True: result = tavily_client.get_research(request_id) status = result['status'] print(f"Status: {status}") if status == 'completed': print("\nResearch Report:") print(result['content']) print(f"\nSources ({len(result['sources'])} found):") for source in result['sources']: print(f"- {source['title']}: {source['url']}") break elif status == 'failed': print("Research failed") break time.sleep(5) # Wait before polling again ``` ### Streaming Research Results Stream research results in real-time as they're generated for immediate processing. ```python from tavily import TavilyClient tavily_client = TavilyClient(api_key="tvly-YOUR_API_KEY") # Create a streaming research task stream = tavily_client.research( input="Research the impact of AI on healthcare", model="pro", stream=True, citation_format="mla" ) # Process the stream as it arrives print("Streaming research results:") for chunk in stream: print(chunk.decode('utf-8'), end='', flush=True) ``` ### Structured Research Output Generate research reports with custom JSON schemas for structured data extraction. ```python from tavily import TavilyClient tavily_client = TavilyClient(api_key="tvly-YOUR_API_KEY") # Define output schema output_schema = { "title": "ResearchReport", "description": "A structured research report on a company", "type": "object", "properties": { "company_name": {"type": "string"}, "summary": {"type": "string"}, "key_findings": { "type": "array", "items": {"type": "string"} }, "market_position": {"type": "string"}, "recent_news": { "type": "array", "items": {"type": "string"} } }, "required": ["company_name", "summary", "key_findings"] } # Create research task with structured output response = tavily_client.research( input="Research Tesla Inc. company information", model="pro", output_schema=output_schema, citation_format="apa" ) # Get results request_id = response['request_id'] result = tavily_client.get_research(request_id) print(result['content']) # JSON formatted according to schema ``` ### Website Crawling Traverse websites starting from a base URL with depth control, intelligent filtering, and query-based content chunking. ```python from tavily import TavilyClient tavily_client = TavilyClient(api_key="tvly-YOUR_API_KEY") # Crawl a website with instructions response = tavily_client.crawl( url="https://wikipedia.org/wiki/Lemon", max_depth=3, # Maximum link depth max_breadth=10, # Maximum links per page limit=50, # Maximum total pages instructions="Find all pages about citrus fruits", select_paths=["/wiki/Citrus*"], # Include paths matching pattern exclude_paths=["/wiki/Talk:*"], # Exclude talk pages select_domains=["wikipedia.org"], allow_external=False, include_images=True, extract_depth="advanced", format="markdown", chunks_per_source=3, # Extract relevant chunks per page include_favicon=True, include_usage=True, timeout=60 ) # Process crawled pages print(f"Total pages crawled: {len(response['results'])}") for result in response['results']: print(f"URL: {result['url']}") print(f"Title: {result.get('title', 'N/A')}") print(f"Content preview: {result['raw_content'][:200]}...") print() ``` ### Website Mapping Discover and visualize the structure of a website without extracting full content. ```python from tavily import TavilyClient tavily_client = TavilyClient(api_key="tvly-YOUR_API_KEY") # Map website structure response = tavily_client.map( url="https://wikipedia.org/wiki/Lemon", max_depth=2, max_breadth=20, limit=30, instructions="Find pages about citrus fruits", select_paths=["/wiki/*"], exclude_domains=["en.m.wikipedia.org"], allow_external=False, include_usage=True, timeout=60 ) # Display site structure print(f"Total URLs discovered: {len(response['results'])}") for idx, result in enumerate(response['results'], 1): print(f"{idx}. {result['url']}") ``` ### Async Client Usage Use the asynchronous client for non-blocking operations in async applications. ```python import asyncio from tavily import AsyncTavilyClient async def perform_searches(): # Initialize async client tavily_client = AsyncTavilyClient(api_key="tvly-YOUR_API_KEY") # Perform multiple searches concurrently search1 = tavily_client.search("Python programming", max_results=5) search2 = tavily_client.search("Machine learning", max_results=5) extract_task = tavily_client.extract(["https://example.com/article"]) # Await all operations results = await asyncio.gather(search1, search2, extract_task) print(f"Search 1: {len(results[0]['results'])} results") print(f"Search 2: {len(results[1]['results'])} results") print(f"Extract: {len(results[2]['results'])} pages extracted") return results # Run async function results = asyncio.run(perform_searches()) ``` ### Async Research with Streaming Stream research results asynchronously for real-time processing in async applications. ```python import asyncio from tavily import AsyncTavilyClient async def stream_research(): tavily_client = AsyncTavilyClient(api_key="tvly-YOUR_API_KEY") # Create streaming research task stream = await tavily_client.research( input="Research climate change solutions", model="pro", stream=True, citation_format="chicago" ) # Process stream chunks async for chunk in stream: print(chunk.decode('utf-8'), end='', flush=True) # Run async streaming asyncio.run(stream_research()) ``` ### OpenAI Assistant Integration Integrate Tavily search as a function calling tool for OpenAI assistants. ```python import os import json import time from openai import OpenAI from tavily import TavilyClient # Initialize clients openai_client = OpenAI(api_key=os.environ["OPENAI_API_KEY"]) tavily_client = TavilyClient(api_key=os.environ["TAVILY_API_KEY"]) # Define search function def tavily_search(query): return tavily_client.search( query, search_depth="advanced", max_results=5, include_answer=True ) # Create assistant with function calling assistant = openai_client.beta.assistants.create( instructions="""You are a research assistant that uses web search to provide accurate, up-to-date information. Always cite sources.""", model="gpt-4-1106-preview", tools=[{ "type": "function", "function": { "name": "tavily_search", "description": "Search the web for current information", "parameters": { "type": "object", "properties": { "query": { "type": "string", "description": "The search query" } }, "required": ["query"] } } }] ) # Create thread and message thread = openai_client.beta.threads.create() message = openai_client.beta.threads.messages.create( thread_id=thread.id, role="user", content="What are the latest developments in AI?" ) # Run assistant run = openai_client.beta.threads.runs.create( thread_id=thread.id, assistant_id=assistant.id ) # Handle function calling while run.status == "requires_action": tool_calls = run.required_action.submit_tool_outputs.tool_calls tool_outputs = [] for tool in tool_calls: if tool.function.name == "tavily_search": args = json.loads(tool.function.arguments) output = json.dumps(tavily_search(args["query"])) tool_outputs.append({ "tool_call_id": tool.id, "output": output }) run = openai_client.beta.threads.runs.submit_tool_outputs( thread_id=thread.id, run_id=run.id, tool_outputs=tool_outputs ) time.sleep(1) run = openai_client.beta.threads.runs.retrieve( thread_id=thread.id, run_id=run.id ) # Get response messages = openai_client.beta.threads.messages.list(thread_id=thread.id) print(messages.data[0].content[0].text.value) ``` ### Hybrid RAG with MongoDB Combine local database search with web search for comprehensive RAG applications. ```python import os from pymongo import MongoClient from tavily import TavilyHybridClient from datetime import datetime # Connect to MongoDB db = MongoClient("mongodb://localhost:27017/")["my_database"] # Initialize hybrid client hybrid_rag = TavilyHybridClient( api_key=os.environ["TAVILY_API_KEY"], db_provider='mongodb', collection=db.get_collection('documents'), index='vector_search', # MongoDB vector search index name embeddings_field='embeddings', # Field containing embeddings content_field='content' # Field containing text content ) # Define save function for new documents def save_document(document): # Only save high-quality results if document['score'] < 0.5: return None return { 'content': document['content'], 'site_title': document['title'], 'site_url': document['url'], 'added_at': datetime.now(), 'score': document['score'] } # Search combining local + web results results = hybrid_rag.search( query="Who is Leo Messi?", max_results=10, # Total results to return max_local=5, # Max from local database max_foreign=5, # Max from web search save_foreign=save_document # Save new findings to DB ) # Process results for result in results: source = result.get('source', 'unknown') print(f"Source: {source}") print(f"Content: {result['content'][:200]}...") if 'url' in result: print(f"URL: {result['url']}") print() # Subsequent searches use saved data results = hybrid_rag.search( "Where did Messi start his career?", max_results=5, save_foreign=True # Auto-save with default formatting ) ``` ### Error Handling Comprehensive error handling for all API operations. ```python from tavily import TavilyClient from tavily.errors import ( MissingAPIKeyError, InvalidAPIKeyError, UsageLimitExceededError, BadRequestError, ForbiddenError, TimeoutError ) try: # Initialize client tavily_client = TavilyClient(api_key="tvly-YOUR_API_KEY") # Perform search response = tavily_client.search( query="test query", search_depth="advanced", max_results=10, timeout=30 ) except MissingAPIKeyError: print("No API key provided. Set TAVILY_API_KEY environment variable.") except InvalidAPIKeyError as e: print(f"Invalid API key: {e}") except UsageLimitExceededError as e: print(f"Usage limit exceeded: {e}") print("Check your plan at tavily.com") except BadRequestError as e: print(f"Bad request: {e}") print("Check your query parameters") except ForbiddenError as e: print(f"Access forbidden: {e}") except TimeoutError as e: print(f"Request timed out: {e}") except Exception as e: print(f"Unexpected error: {e}") ``` ### Proxy Configuration Configure HTTP/HTTPS proxies for API requests. ```python from tavily import TavilyClient import os # Method 1: Pass proxies to constructor tavily_client = TavilyClient( api_key="tvly-YOUR_API_KEY", proxies={ "http": "http://proxy.example.com:8080", "https": "https://proxy.example.com:8080" } ) # Method 2: Use environment variables os.environ["TAVILY_HTTP_PROXY"] = "http://proxy.example.com:8080" os.environ["TAVILY_HTTPS_PROXY"] = "https://proxy.example.com:8080" tavily_client = TavilyClient(api_key="tvly-YOUR_API_KEY") # Method 3: API key from environment os.environ["TAVILY_API_KEY"] = "tvly-YOUR_API_KEY" tavily_client = TavilyClient() # Auto-loads from environment # Perform search through proxy response = tavily_client.search("proxy test") print(response) ``` ### Custom API Base URL Override the default API base URL for custom deployments or testing. ```python from tavily import TavilyClient # Use custom API endpoint tavily_client = TavilyClient( api_key="tvly-YOUR_API_KEY", api_base_url="https://custom-api.example.com" ) response = tavily_client.search("custom endpoint test") print(response) ``` ## Summary and Integration Tavily Python wrapper provides a production-ready solution for integrating intelligent web search capabilities into Python applications. The library excels in multiple scenarios including real-time web search with multiple speed/depth options (basic, advanced, fast, ultra-fast), content extraction from multiple URLs with smart chunking, website crawling and mapping for comprehensive site analysis, and automated research report generation with structured outputs and multiple citation formats. The Research API is particularly powerful for generating comprehensive reports with automatic source gathering, analysis, and customizable structured outputs using JSON schemas. The library's main use cases include building RAG systems for chatbots and knowledge bases with the Research API providing long-form context, integrating real-time web search into AI assistants (like OpenAI's GPT models) using function calling, extracting and processing content from multiple web sources with query-based relevance filtering, performing comprehensive research with automated source gathering and structured outputs, and creating hybrid search systems that combine local databases with web results. Integration patterns are straightforward with support for both synchronous and asynchronous programming models, built-in error handling for production environments, proxy support for enterprise networks, streaming support for real-time results, and ready-to-use examples for popular frameworks including OpenAI Assistants, LangChain, and MongoDB vector search.