Tavily Python (tavily-ai/tavily-python)

Tavily Python

https://github.com/tavily-ai/tavily-python
Admin
A python wrapper for Tavily search API

Tokens:12,944
Snippets:69
Trust Score:8.4
Update:6 days ago
Show doc for...
Context Summary (auto-generated)
Raw
# Tavily Python SDK

Tavily Python SDK is a comprehensive wrapper for interacting with the Tavily API, providing intelligent web search, content extraction, website crawling, site mapping, and research capabilities directly from Python applications. The SDK enables developers to easily integrate AI-powered search functionality into their applications, supporting both synchronous and asynchronous operations with full proxy and custom session support.

The SDK offers five core functionalities: Search for web queries with multiple depth options, Extract for pulling content from specific URLs, Crawl for traversing website content, Map for discovering website structure, and Research for generating comprehensive research reports. It supports RAG (Retrieval-Augmented Generation) applications with built-in context generation, Q&A capabilities, and a hybrid RAG client that combines Tavily's web search with local database queries using MongoDB vector search.

---

## TavilyClient Initialization

The TavilyClient is the main entry point for all synchronous API interactions. It handles authentication, session management, and provides methods for search, extract, crawl, map, and research operations with configurable timeouts and proxy support.

```python
from tavily import TavilyClient
import os

# Basic initialization with API key
client = TavilyClient(api_key="tvly-YOUR_API_KEY")

# Initialize from environment variable (TAVILY_API_KEY)
client = TavilyClient()

# Advanced initialization with proxy support
client = TavilyClient(
    api_key="tvly-YOUR_API_KEY",
    proxies={
        "http": "http://proxy.example.com:8080",
        "https": "https://proxy.example.com:8080"
    },
    api_base_url="https://api.tavily.com",  # Custom base URL
    project_id="my-project-id"
)

# Using as context manager for automatic resource cleanup
with TavilyClient(api_key="tvly-YOUR_API_KEY") as client:
    response = client.search("Latest AI news")
    print(response)
# Session automatically closed
```

---

## AsyncTavilyClient Initialization

The AsyncTavilyClient provides the same functionality as TavilyClient but with async/await support using httpx for non-blocking HTTP operations, ideal for high-concurrency applications.

```python
import asyncio
from tavily import AsyncTavilyClient

async def main():
    # Basic async initialization
    client = AsyncTavilyClient(api_key="tvly-YOUR_API_KEY")

    # Using async context manager
    async with AsyncTavilyClient(api_key="tvly-YOUR_API_KEY") as client:
        response = await client.search("What is machine learning?")
        print(response)

    # With custom httpx client for enterprise proxy/auth
    import httpx
    custom_client = httpx.AsyncClient(
        headers={"Authorization": "Bearer your-gateway-token"},
        base_url="https://your-gateway.com/tavily"
    )
    client = AsyncTavilyClient(client=custom_client)
    response = await client.search("latest AI research")
    await client.close()

asyncio.run(main())
```

---

## search() - Web Search

Performs a web search with configurable depth, topic filtering, time ranges, domain restrictions, and result customization. Returns search results with URLs, content snippets, and relevance scores.

```python
from tavily import TavilyClient

client = TavilyClient(api_key="tvly-YOUR_API_KEY")

# Basic search
response = client.search("Who is Leo Messi?")
print(response)
# Output: {'query': 'Who is Leo Messi?', 'results': [{'url': '...', 'title': '...', 'content': '...', 'score': 0.95}, ...]}

# Advanced search with all options
response = client.search(
    query="OpenAI GPT-4 capabilities",
    search_depth="advanced",      # "basic", "advanced", "fast", "ultra-fast"
    topic="general",              # "general", "news", "finance"
    time_range="month",           # "day", "week", "month", "year"
    max_results=10,
    include_domains=["openai.com", "arxiv.org"],
    exclude_domains=["reddit.com"],
    include_answer="advanced",    # True, False, "basic", "advanced"
    include_raw_content="markdown",  # True, False, "markdown", "text"
    include_images=True,
    country="us",
    timeout=60
)

for result in response["results"]:
    print(f"Title: {result['title']}")
    print(f"URL: {result['url']}")
    print(f"Content: {result['content'][:200]}...")
    print(f"Score: {result['score']}")
    print("---")

# Search with exact phrase matching
response = client.search(
    query='"John Smith" CEO "Acme Corp"',
    exact_match=True  # Only return results containing exact phrases in quotes
)

# News search with date filtering
response = client.search(
    query="Tesla stock news",
    topic="news",
    start_date="2024-01-01",
    end_date="2024-01-31"
)
```

---

## extract() - Content Extraction

Extracts raw content from one or more URLs, with support for images, multiple output formats, and configurable extraction depth. Can process up to 20 URLs simultaneously.

```python
from tavily import TavilyClient

client = TavilyClient(api_key="tvly-YOUR_API_KEY")

# Extract from single URL
response = client.extract(urls="https://en.wikipedia.org/wiki/Artificial_intelligence")

# Extract from multiple URLs (up to 20)
urls = [
    "https://en.wikipedia.org/wiki/Artificial_intelligence",
    "https://en.wikipedia.org/wiki/Machine_learning",
    "https://en.wikipedia.org/wiki/Data_science"
]

response = client.extract(
    urls=urls,
    include_images=True,
    extract_depth="advanced",    # "basic" or "advanced"
    format="markdown",           # "markdown" or "text"
    timeout=30
)

# Process successful extractions
for result in response["results"]:
    print(f"URL: {result['url']}")
    print(f"Raw Content: {result['raw_content'][:500]}...")
    if result.get('images'):
        print(f"Images: {result['images']}")
    print("---")

# Handle failed extractions
for failed in response["failed_results"]:
    print(f"Failed to extract: {failed}")

# Query-focused extraction (extracts content relevant to query)
response = client.extract(
    urls=["https://docs.python.org/3/library/asyncio.html"],
    query="How to create async tasks",
    chunks_per_source=3
)
```

---

## crawl() - Website Crawling

Traverses a website starting from a base URL, extracting content from discovered pages with configurable depth, breadth limits, and AI-powered instructions for focused crawling.

```python
from tavily import TavilyClient

client = TavilyClient(api_key="tvly-YOUR_API_KEY")

# Basic crawl
response = client.crawl(url="https://docs.python.org/3/")

# Advanced crawl with instructions
response = client.crawl(
    url="https://wikipedia.org/wiki/Lemon",
    max_depth=3,                  # How deep to follow links
    max_breadth=10,               # Max links to follow per page
    limit=50,                     # Max total pages to crawl
    instructions="Find all pages about citrus fruits",  # AI-guided crawling
    select_paths=["/wiki/"],      # Only follow these paths
    exclude_paths=["/wiki/Special:", "/wiki/Talk:"],
    exclude_domains=["facebook.com", "twitter.com"],
    allow_external=False,         # Stay on the same domain
    include_images=True,
    extract_depth="advanced",
    format="markdown",
    timeout=150
)

for result in response["results"]:
    print(f"URL: {result['url']}")
    print(f"Content Preview: {result['raw_content'][:200]}...")
    print("---")

# Crawl with domain selection
response = client.crawl(
    url="https://example.com",
    select_domains=["example.com", "blog.example.com"],
    allow_external=True
)
```

---

## map() - Website Structure Discovery

Discovers and visualizes the structure of a website, returning a list of URLs found starting from a base URL without extracting full content.

```python
from tavily import TavilyClient

client = TavilyClient(api_key="tvly-YOUR_API_KEY")

# Basic site mapping
response = client.map(url="https://docs.python.org/3/")

# Advanced mapping with filters
response = client.map(
    url="https://wikipedia.org/wiki/Lemon",
    max_depth=2,
    max_breadth=20,
    limit=100,
    instructions="Find pages about citrus fruits and their cultivation",
    select_paths=["/wiki/"],
    exclude_paths=["/wiki/Special:", "/wiki/File:"],
    allow_external=False,
    timeout=150
)

print(f"Found {len(response['results'])} URLs:")
for result in response["results"]:
    print(f"  - {result['url']}")

# Map with domain filtering
response = client.map(
    url="https://example.com/docs",
    select_domains=["example.com"],
    exclude_domains=["ads.example.com"]
)
```

---

## research() - Comprehensive Research Reports

Creates comprehensive research reports on any topic using AI-powered analysis with automatic source gathering. Supports both synchronous and streaming modes with customizable citation formats.

```python
from tavily import TavilyClient

client = TavilyClient(api_key="tvly-YOUR_API_KEY")

# Basic research task (non-streaming)
response = client.research(
    input="Research the latest developments in quantum computing",
    model="pro"  # "mini", "pro", or "auto"
)

# Get research results by request_id
request_id = response["request_id"]
result = client.get_research(request_id)

print(f"Status: {result['status']}")
if result['status'] == 'completed':
    print(f"Content: {result['content']}")
    print(f"Sources: {len(result['sources'])} sources found")
    for source in result['sources']:
        print(f"  - {source['url']}")

# Research with citation format
response = client.research(
    input="Analyze the impact of AI on healthcare",
    model="pro",
    citation_format="apa"  # "numbered", "mla", "apa", "chicago"
)

# Streaming research results
stream = client.research(
    input="Research the latest developments in AI",
    model="pro",
    stream=True
)

for chunk in stream:
    print(chunk.decode('utf-8'), end='', flush=True)

# Research with structured output schema
output_schema = {
    "type": "object",
    "properties": {
        "summary": {"type": "string"},
        "key_findings": {"type": "array", "items": {"type": "string"}},
        "recommendations": {"type": "array", "items": {"type": "string"}}
    }
}

response = client.research(
    input="Analyze renewable energy trends in 2024",
    model="pro",
    output_schema=output_schema
)
```

---

## get_search_context() - RAG Context Generation (Deprecated)

Generates search context optimized for RAG applications by extracting and formatting relevant content from search results with token limits. Note: This method is deprecated.

```python
from tavily import TavilyClient
import warnings

client = TavilyClient(api_key="tvly-YOUR_API_KEY")

# Suppress deprecation warning if needed
with warnings.catch_warnings():
    warnings.simplefilter("ignore", DeprecationWarning)

    context = client.get_search_context(
        query="What happened during the Burning Man floods?",
        search_depth="advanced",
        topic="news",
        max_results=5,
        max_tokens=4000  # Token limit for LLM context windows
    )

print(context)
# Output: JSON string of [{url: "...", content: "..."}, ...]

# Use with LLM for RAG
import json
context_data = json.loads(context)
prompt = f"""Based on the following context, answer the question.

Context:
{context}

Question: What caused the Burning Man floods?
"""
```

---

## qna_search() - Question Answering (Deprecated)

Performs a search and returns a direct answer to a question using advanced search depth. Note: This method is deprecated.

```python
from tavily import TavilyClient
import warnings

client = TavilyClient(api_key="tvly-YOUR_API_KEY")

with warnings.catch_warnings():
    warnings.simplefilter("ignore", DeprecationWarning)

    answer = client.qna_search(
        query="Who is the CEO of OpenAI?",
        search_depth="advanced",
        topic="general",
        max_results=5
    )

print(f"Answer: {answer}")
# Output: "Sam Altman is the CEO of OpenAI..."

# News-focused Q&A
answer = client.qna_search(
    query="What is the current Bitcoin price?",
    topic="finance"
)
```

---

## TavilyHybridClient - Hybrid RAG with MongoDB

Combines Tavily's web search with local MongoDB vector search for hybrid RAG applications. Uses Cohere for embeddings and reranking by default, with support for custom functions.

```python
from pymongo import MongoClient
from tavily import TavilyHybridClient
from datetime import datetime
import os

# Connect to MongoDB
db = MongoClient("mongodb://localhost:27017")["my_database"]

# Initialize hybrid client
hybrid_client = TavilyHybridClient(
    api_key=os.environ["TAVILY_API_KEY"],
    db_provider='mongodb',
    collection=db.get_collection('documents'),
    index='vector_search',        # MongoDB vector search index name
    embeddings_field='embeddings',
    content_field='content'
)

# Basic hybrid search
results = hybrid_client.search(
    query="What are the benefits of machine learning?",
    max_results=10
)

for result in results:
    print(f"Content: {result['content'][:200]}...")
    print(f"Score: {result['score']}")
    print(f"Origin: {result['origin']}")  # 'local' or 'foreign'
    print("---")

# Hybrid search with separate limits
results = hybrid_client.search(
    query="Latest AI breakthroughs",
    max_results=10,
    max_local=5,      # Max results from MongoDB
    max_foreign=5,    # Max results from Tavily
    save_foreign=True # Save Tavily results to MongoDB
)

# Custom save function for foreign results
def save_document(document):
    if document['score'] < 0.5:
        return None  # Don't save low-score documents

    return {
        'content': document['content'],
        'embeddings': document['embeddings'],
        'site_title': document.get('title', ''),
        'site_url': document.get('url', ''),
        'added_at': datetime.now()
    }

results = hybrid_client.search(
    query="Who is Leo Messi?",
    save_foreign=save_document
)

# Custom embedding and ranking functions
def custom_embed(texts, input_type):
    # Your custom embedding logic
    return [[0.1, 0.2, ...] for _ in texts]

def custom_rerank(query, documents, top_n):
    # Your custom reranking logic
    sorted_docs = sorted(documents, key=lambda x: x['score'], reverse=True)
    return sorted_docs[:top_n]

hybrid_client = TavilyHybridClient(
    api_key=os.environ["TAVILY_API_KEY"],
    db_provider='mongodb',
    collection=db.get_collection('documents'),
    index='vector_search',
    embedding_function=custom_embed,
    ranking_function=custom_rerank
)
```

---

## Async Search Operations

Demonstrates async search, extract, crawl, and map operations with concurrent execution for improved performance.

```python
import asyncio
from tavily import AsyncTavilyClient

async def main():
    async with AsyncTavilyClient(api_key="tvly-YOUR_API_KEY") as client:
        # Concurrent searches
        queries = [
            "Latest AI news",
            "Climate change updates",
            "Stock market trends"
        ]

        tasks = [client.search(query) for query in queries]
        results = await asyncio.gather(*tasks)

        for query, response in zip(queries, results):
            print(f"Query: {query}")
            print(f"Results: {len(response['results'])} found")
            print("---")

        # Async extract
        response = await client.extract(
            urls=["https://example.com", "https://example.org"],
            include_images=True
        )

        # Async crawl
        response = await client.crawl(
            url="https://docs.python.org/3/",
            max_depth=2,
            limit=20
        )

        # Async map
        response = await client.map(
            url="https://example.com",
            max_depth=2
        )

        # Async research with streaming
        stream = await client.research(
            input="Research quantum computing applications",
            model="pro",
            stream=True
        )

        async for chunk in stream:
            print(chunk.decode('utf-8'), end='', flush=True)

asyncio.run(main())
```

---

## Error Handling

The SDK provides specific exception types for different error conditions including rate limits, authentication failures, bad requests, and timeouts.

```python
from tavily import TavilyClient
from tavily.errors import (
    InvalidAPIKeyError,
    UsageLimitExceededError,
    MissingAPIKeyError,
    BadRequestError,
    ForbiddenError,
    TimeoutError
)

try:
    client = TavilyClient(api_key="tvly-YOUR_API_KEY")
    response = client.search("test query", timeout=5)
except MissingAPIKeyError:
    print("Error: No API key provided. Set TAVILY_API_KEY or pass api_key parameter.")
except InvalidAPIKeyError as e:
    print(f"Error: Invalid API key - {e}")
except UsageLimitExceededError as e:
    print(f"Error: Rate limit exceeded - {e}")
except BadRequestError as e:
    print(f"Error: Bad request - {e}")
except ForbiddenError as e:
    print(f"Error: Access forbidden - {e}")
except TimeoutError as e:
    print(f"Error: Request timed out - {e}")
except Exception as e:
    print(f"Unexpected error: {e}")

# Retry logic example
import time

def search_with_retry(client, query, max_retries=3):
    for attempt in range(max_retries):
        try:
            return client.search(query)
        except UsageLimitExceededError:
            if attempt < max_retries - 1:
                wait_time = 2 ** attempt  # Exponential backoff
                print(f"Rate limited. Waiting {wait_time}s...")
                time.sleep(wait_time)
            else:
                raise
        except TimeoutError:
            if attempt < max_retries - 1:
                print(f"Timeout. Retrying...")
            else:
                raise
```

---

## Custom Session / Enterprise Gateway Integration

For enterprise environments that proxy Tavily traffic through an API gateway, you can inject a pre-configured HTTP session for centralized auth, logging, or policy enforcement.

```python
import requests
import httpx
from tavily import TavilyClient, AsyncTavilyClient

# Sync client with custom requests.Session
session = requests.Session()
session.headers["Authorization"] = "Bearer your-gateway-token"
session.headers["X-Subscription-Key"] = "your-subscription-key"
session.proxies = {
    "http": "http://corporate-proxy:8080",
    "https": "https://corporate-proxy:8080"
}

client = TavilyClient(
    session=session,
    api_base_url="https://your-gateway.com/tavily"
)
# Note: api_key is optional when using custom session with auth headers

response = client.search("latest AI research")
# Session lifecycle managed externally - SDK won't close it

# Async client with custom httpx.AsyncClient
async def enterprise_search():
    custom_client = httpx.AsyncClient(
        headers={
            "Authorization": "Bearer your-gateway-token",
            "X-Custom-Header": "value"
        },
        base_url="https://your-gateway.com/tavily",
        timeout=30.0
    )

    client = AsyncTavilyClient(client=custom_client)
    response = await client.search("enterprise data")

    # Close custom client manually
    await custom_client.aclose()
    return response
```

---

The Tavily Python SDK is designed for AI application developers building RAG systems, research tools, and intelligent search features. Common use cases include generating grounded context for LLM prompts, building fact-checking systems, creating research assistants, and implementing hybrid search combining web and local knowledge bases. The SDK integrates seamlessly with OpenAI Assistants, LangChain, and other AI frameworks.

The integration patterns support both simple single-query searches and complex multi-source research workflows. For high-throughput applications, the async client enables concurrent operations, while the hybrid RAG client provides a turnkey solution for combining real-time web data with domain-specific knowledge stored in MongoDB. Enterprise deployments benefit from custom session injection for gateway integration, proxy support, and comprehensive error handling with specific exception types for graceful degradation.