### Set Up Environment Variables
Source: https://github.com/langchain-ai/open_deep_research/blob/main/README.md
Copy the example environment file to create your own .env file for custom configuration settings.
```bash
cp .env.example .env
```
--------------------------------
### Launch LangGraph Server
Source: https://github.com/langchain-ai/open_deep_research/blob/main/README.md
Start the LangGraph server locally with the Open Deep Research project. This command installs dependencies and launches the development server, providing API and UI endpoints.
```bash
# Install dependencies and start the LangGraph server
uvx --refresh --from "langgraph-cli[inmem]" --with-editable . --python 3.11 langgraph dev --allow-blocking
```
--------------------------------
### Install Dependencies
Source: https://github.com/langchain-ai/open_deep_research/blob/main/README.md
Install project dependencies using uv. You can sync all dependencies or install them from the pyproject.toml file.
```bash
uv sync
# or
uv pip install -r pyproject.toml
```
--------------------------------
### Programmatic Evaluation Setup
Source: https://context7.com/langchain-ai/open_deep_research/llms.txt
This Python code snippet shows how to set up and run an evaluation programmatically. It defines a configuration dictionary specifying various models for research, summarization, compression, and final reporting, along with the search API to be used. The comments outline the nine quality criteria evaluated.
```python
# Programmatic evaluation setup
from tests.evaluators import evaluate_report_quality
from tests.run_evaluate import run_evaluation
# Run evaluation with custom configuration
config = {
"configurable": {
"research_model": "anthropic:claude-sonnet-4-20250514",
"summarization_model": "openai:gpt-4.1-mini",
"compression_model": "openai:gpt-4.1",
"final_report_model": "openai:gpt-4.1",
"search_api": "tavily"
}
}
# Evaluation checks 9 quality criteria:
# 1. Topic Relevance - Does report address the topic?
# 2. Section Relevance - Are all sections relevant?
# 3. Structure and Flow - Logical narrative flow?
# 4. Introduction Quality - Proper context and scope?
# 5. Conclusion Quality - Summarizes key findings?
# 6. Structural Elements - Tables, lists usage?
# 7. Section Headers - Correct Markdown formatting?
# 8. Citations - Proper source citation?
# 9. Overall Quality - Well-researched and accurate?
```
--------------------------------
### LangGraph Server Deployment Command
Source: https://context7.com/langchain-ai/open_deep_research/llms.txt
Command to install dependencies and start the LangGraph server locally for the deep researcher project. It specifies Python version and uses editable install.
```bash
# Install dependencies and start the LangGraph server
uvx --refresh --from "langchain-cli[inmem]" --with-editable . --python 3.11 langgraph dev --allow-blocking
# Server endpoints:
# - API: http://127.0.0.1:2024
# - Studio UI: https://smith.langchain.com/studio/?baseUrl=http://127.0.0.1:2024
# - API Docs: http://127.0.0.1:2024/docs
```
--------------------------------
### Environment Configuration for API Keys
Source: https://context7.com/langchain-ai/open_deep_research/llms.txt
This section outlines the necessary environment variables for configuring API keys and LangSmith tracing. It lists common keys for OpenAI, Anthropic, Google, and Tavily, as well as LangSmith specific variables for API key, project name, and tracing enablement. This setup is crucial for running evaluations and utilizing various services.
```bash
# .env file configuration
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
GOOGLE_API_KEY=...
TAVILY_API_KEY=tvly-...
# LangSmith tracing (optional but recommended)
LANGSMITH_API_KEY=ls-...
LANGSMITH_PROJECT=open-deep-research
LANGSMITH_TRACING=true
```
--------------------------------
### Clone Repository and Activate Virtual Environment
Source: https://github.com/langchain-ai/open_deep_research/blob/main/README.md
Clone the repository and set up a virtual environment for the project. Ensure you are using the correct activation command for your operating system.
```bash
git clone https://github.com/langchain-ai/open_deep_research.git
cd open_deep_research
uv venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
```
--------------------------------
### Load Configuration from Environment
Source: https://context7.com/langchain-ai/open_deep_research/llms.txt
Loads configuration from environment variables using dotenv and initializes the Configuration object. Overrides can be applied using runtime configurations.
```python
# Load configuration from environment
import os
from dotenv import load_dotenv
from open_deep_research.configuration import Configuration
load_dotenv()
# Environment variables are automatically loaded
config = Configuration.from_runnable_config({})
print(f"Using model: {config.research_model}")
# Override with runtime config
runtime_config = Configuration.from_runnable_config({
"configurable": {
"research_model": "anthropic:claude-sonnet-4-20250514",
"search_api": "anthropic" # Use Anthropic's native web search
}
})
```
--------------------------------
### AgentInputState Initialization
Source: https://context7.com/langchain-ai/open_deep_research/llms.txt
Initializes the AgentInputState with a HumanMessage, representing the entry point for user input in the research workflow.
```python
from langchain_core.messages import HumanMessage, AIMessage
# AgentInputState - Entry point for user messages
input_state = {
"messages": [HumanMessage(content="Research the impact of AI on healthcare")]
}
```
--------------------------------
### Run Deep Research Bench Evaluation
Source: https://context7.com/langchain-ai/open_deep_research/llms.txt
This section provides bash commands to set up environment variables and run the Deep Research Bench evaluation. It includes commands for setting API keys for LangSmith, OpenAI, and Tavily, and then executing the evaluation script. Remember to replace placeholders with your actual keys and experiment names.
```bash
# Set up environment variables
export LANGSMITH_API_KEY="your-langsmith-key"
export LANGSMITH_PROJECT="deep-research-eval"
export OPENAI_API_KEY="your-openai-key"
export TAVILY_API_KEY="your-tavily-key"
# Run evaluation (costs ~$20-$100 depending on model)
python tests/run_evaluate.py
# Extract results for submission
python tests/extract_langsmith_data.py \
--project-name "YOUR_EXPERIMENT_NAME" \
--model-name "gpt-4.1" \
--dataset-name "deep_research_bench"
# Output: tests/expt_results/deep_research_bench_gpt-4.1.jsonl
```
--------------------------------
### Run Comprehensive Evaluation
Source: https://github.com/langchain-ai/open_deep_research/blob/main/README.md
Execute a full evaluation of the project on LangSmith datasets. This command initiates the evaluation process.
```bash
python tests/run_evaluate.py
```
--------------------------------
### Configure MCP Tools Integration
Source: https://context7.com/langchain-ai/open_deep_research/llms.txt
This Python code configures and loads Model Context Protocol (MCP) tools to extend research capabilities. It sets up MCP server configuration, including authentication and available tools, and then loads these tools into the existing toolset. Ensure your MCP server is accessible and configured correctly.
```python
from open_deep_research.utils import load_mcp_tools
from open_deep_research.configuration import MCPConfig
# Configure MCP server with authentication
mcp_config = MCPConfig(
url="https://your-mcp-server.com",
tools=["document_search", "database_query", "calendar_check"],
auth_required=True
)
# Runtime configuration with MCP
config = {
"configurable": {
"mcp_config": mcp_config.model_dump(),
"mcp_prompt": """
Use document_search for internal company documents.
Use database_query for structured data lookups.
Use calendar_check for scheduling information.
"""
},
"metadata": {
"owner": "user-123"
}
}
# Load MCP tools (called internally by get_all_tools)
async def load_tools_example():
existing_tools = {"tavily_search", "think_tool", "ResearchComplete"}
mcp_tools = await load_mcp_tools(config, existing_tools)
for tool in mcp_tools:
print(f"Loaded MCP tool: {tool.name}")
print(f"Description: {tool.description}")
import asyncio
asyncio.run(load_tools_example())
```
--------------------------------
### Configuration Class for Research Agent
Source: https://context7.com/langchain-ai/open_deep_research/llms.txt
Defines all configurable parameters for the research agent, including model selection, search API, and MCP settings. Use `Configuration.from_runnable_config` to load configuration from a dictionary.
```python
from open_deep_research.configuration import Configuration, SearchAPI, MCPConfig
from langchain_core.runnables import RunnableConfig
# Create configuration from environment variables and runtime config
runtime_config: RunnableConfig = {
"configurable": {
# Model Configuration
"summarization_model": "openai:gpt-4.1-mini", # Summarizes search results
"summarization_model_max_tokens": 8192,
"research_model": "openai:gpt-4.1", # Powers the search agent
"research_model_max_tokens": 10000,
"compression_model": "openai:gpt-4.1", # Compresses research findings
"compression_model_max_tokens": 8192,
"final_report_model": "openai:gpt-4.1", # Writes final report
"final_report_model_max_tokens": 10000,
# Search Configuration
"search_api": "tavily", # Options: "tavily", "anthropic", "openai", "none"
"max_content_length": 50000,
# Research Limits
"max_researcher_iterations": 6,
"max_react_tool_calls": 10,
"max_concurrent_research_units": 5,
"max_structured_output_retries": 3,
# User Interaction
"allow_clarification": True,
# MCP Configuration (optional)
"mcp_config": {
"url": "https://your-mcp-server.com",
"tools": ["custom_search", "database_query"],
"auth_required": True
},
"mcp_prompt": "Use custom_search for internal documents"
}
}
# Load configuration
config = Configuration.from_runnable_config(runtime_config)
print(f"Research model: {config.research_model}")
print(f"Search API: {config.search_api.value}")
```
--------------------------------
### Direct Search with Summarization
Source: https://context7.com/langchain-ai/open_deep_research/llms.txt
Executes a search with multiple queries and uses a summarization model to process results. Configuration options for summarization and content length are provided.
```python
async def search_example():
config = {
"configurable": {
"summarization_model": "openai:gpt-4.1-mini",
"summarization_model_max_tokens": 8192,
"max_content_length": 50000,
"max_structured_output_retries": 3
}
}
# Execute search with multiple queries
results = await tavily_search.ainvoke(
{
"queries": [
"latest developments in AI safety research 2024",
"OpenAI AI alignment approaches",
"Anthropic constitutional AI methodology"
]
},
config
)
print(results)
# Output format:
# Search results:
# --- SOURCE 1: Article Title ---
# URL: https://example.com/article
# SUMMARY:
# Main findings...
# Important quotes...
```
```python
asyncio.run(search_example())
```
--------------------------------
### SupervisorState Initialization
Source: https://context7.com/langchain-ai/open_deep_research/llms.txt
Initializes the SupervisorState, which tracks the supervisor's research management, including messages, research brief, notes, and iteration counts.
```python
from open_deep_research.state import (
AgentState,
AgentInputState,
SupervisorState,
ResearcherState,
ConductResearch,
ResearchComplete,
ClarifyWithUser,
ResearchQuestion
)
# SupervisorState - Tracks supervisor's research management
supervisor_state: SupervisorState = {
"supervisor_messages": [],
"research_brief": "Comprehensive analysis of AI applications in healthcare...",
"notes": [],
"research_iterations": 0,
"raw_notes": []
}
```
--------------------------------
### Structured Output for ClarifyWithUser
Source: https://context7.com/langchain-ai/open_deep_research/llms.txt
Defines a structured output for ClarifyWithUser, indicating the need for clarification and posing a specific question to the user regarding the healthcare domain of interest.
```python
clarify = ClarifyWithUser(
need_clarification=True,
question="Are you interested in a specific healthcare domain (e.g., diagnostics, drug discovery, patient care)?",
verification=""
)
```
--------------------------------
### Run Deep Researcher with a Question
Source: https://context7.com/langchain-ai/open_deep_research/llms.txt
This asynchronous function orchestrates the complete research workflow from user input to final report generation. It configures the research parameters and invokes the deep_researcher graph.
```python
from langgraph.graph import StateGraph
from open_deep_research.deep_researcher import deep_researcher
from open_deep_research.configuration import Configuration
from langchain_core.messages import HumanMessage
# Run the deep researcher with a research question
async def run_research(question: str):
config = {
"configurable": {
"research_model": "openai:gpt-4.1",
"summarization_model": "openai:gpt-4.1-mini",
"compression_model": "openai:gpt-4.1",
"final_report_model": "openai:gpt-4.1",
"search_api": "tavily",
"max_researcher_iterations": 6,
"max_concurrent_research_units": 5,
"allow_clarification": True
}
}
result = await deep_researcher.ainvoke(
{"messages": [HumanMessage(content=question)]},
config
)
return result["final_report"]
# Example usage
import asyncio
report = asyncio.run(run_research("Compare the approaches of OpenAI and Anthropic to AI safety"))
print(report)
```
--------------------------------
### Structured Output for ResearchComplete
Source: https://context7.com/langchain-ai/open_deep_research/llms.txt
Represents a structured output indicating that the research process is complete.
```python
research_complete = ResearchComplete()
```
--------------------------------
### Think Tool for Strategic Reflection
Source: https://context7.com/langchain-ai/open_deep_research/llms.txt
Utilizes the think_tool for strategic reflection during research. It takes a detailed reflection string as input, analyzing findings, assessing gaps, and making strategic decisions.
```python
from open_deep_research.utils import think_tool
# The think_tool is used internally by researchers to reflect on progress
# Example of how it's called during research:
reflection = think_tool.invoke({
"reflection": """
Analysis of current findings:
- Found 3 relevant sources on AI safety approaches
- OpenAI focuses on RLHF and Constitutional AI
- Anthropic emphasizes interpretability research
Gap assessment:
- Missing: concrete examples of safety failures
- Missing: comparison of resource allocation
Strategic decision:
- Need one more search for safety incident examples
- Then have sufficient information to provide comprehensive answer
"""
})
print(reflection)
# Output: "Reflection recorded: Analysis of current findings..."
```
--------------------------------
### ResearcherState Initialization
Source: https://context7.com/langchain-ai/open_deep_research/llms.txt
Initializes the ResearcherState for an individual researcher, including their messages, tool call iterations, research topic, and notes.
```python
# ResearcherState - Individual researcher's working state
researcher_state: ResearcherState = {
"researcher_messages": [HumanMessage(content="AI diagnostic tools in radiology")],
"tool_call_iterations": 0,
"research_topic": "AI diagnostic tools in radiology",
"compressed_research": "",
"raw_notes": []
}
```
--------------------------------
### Call Deep Research API
Source: https://context7.com/langchain-ai/open_deep_research/llms.txt
This Python script demonstrates how to programmatically interact with a deployed deep research API. It creates a thread, runs a deep researcher with specific inputs and configurations, polls for completion, and retrieves the final report. Ensure the API server is running at http://127.0.0.1:2024.
```python
import httpx
import asyncio
async def call_research_api():
async with httpx.AsyncClient(timeout=300.0) as client:
# Create a new thread
response = await client.post(
"http://127.0.0.1:2024/threads",
json={}
)
thread_id = response.json()["thread_id"]
# Run the deep researcher
response = await client.post(
f"http://127.0.0.1:2024/threads/{thread_id}/runs",
json={
"assistant_id": "Deep Researcher",
"input": {
"messages": [{
"role": "human",
"content": "What are the latest breakthroughs in fusion energy?"
}]
},
"config": {
"configurable": {
"research_model": "openai:gpt-4.1",
"search_api": "tavily"
}
}
}
)
run_id = response.json()["run_id"]
# Poll for completion
while True:
status = await client.get(
f"http://127.0.0.1:2024/threads/{thread_id}/runs/{run_id}"
)
if status.json()["status"] == "success":
break
await asyncio.sleep(2)
# Get final state
state = await client.get(
f"http://127.0.0.1:2024/threads/{thread_id}/state"
)
return state.json()["values"]["final_report"]
report = asyncio.run(call_research_api())
print(report)
```
--------------------------------
### Structured Output for ConductResearch
Source: https://context7.com/langchain-ai/open_deep_research/llms.txt
Defines a structured output for the ConductResearch tool, specifying the research topic with detailed requirements for accuracy, approvals, and study focus.
```python
# Structured outputs for tool calls
conduct_research = ConductResearch(
research_topic="""Research the current state of AI-powered diagnostic tools
in radiology, including accuracy rates, FDA approvals, and adoption in
clinical settings. Focus on peer-reviewed studies from 2022-2024."""
)
```
--------------------------------
### Tavily Search Tool
Source: https://context7.com/langchain-ai/open_deep_research/llms.txt
Executes web searches using Tavily API with automatic content summarization for research tasks. Supports both synchronous and asynchronous operations.
```python
from open_deep_research.utils import tavily_search, tavily_search_async
import asyncio
```
--------------------------------
### Extract LangSmith Data for Deep Research Bench
Source: https://github.com/langchain-ai/open_deep_research/blob/main/README.md
Extract evaluation results from LangSmith into a JSONL file format required by the Deep Research Bench. Specify your project name, model name, and dataset name.
```bash
python tests/extract_langsmith_data.py --project-name "YOUR_EXPERIMENT_NAME" --model-name "you-model-name" --dataset-name "deep_research_bench"
```
--------------------------------
### Raw Tavily Search without Summarization
Source: https://context7.com/langchain-ai/open_deep_research/llms.txt
Performs a raw search using Tavily, allowing for raw content inclusion and specifying search parameters like max results and topic. Results are printed with title, URL, and truncated content.
```python
async def raw_search_example():
results = await tavily_search_async(
search_queries=["quantum computing breakthroughs 2024"],
max_results=5,
topic="general", # Options: "general", "news", "finance"
include_raw_content=True,
config={"configurable": {}}
)
for response in results:
for result in response['results']:
print(f"Title: {result['title']}")
print(f"URL: {result['url']}")
print(f"Content: {result['content'][:200]}...")
```
=== COMPLETE CONTENT === This response contains all available snippets from this library. No additional content exists. Do not make further requests.