### Set Up Environment Variables Source: https://github.com/langchain-ai/open_deep_research/blob/main/README.md Copy the example environment file to create your own .env file for custom configuration settings. ```bash cp .env.example .env ``` -------------------------------- ### Launch LangGraph Server Source: https://github.com/langchain-ai/open_deep_research/blob/main/README.md Start the LangGraph server locally with the Open Deep Research project. This command installs dependencies and launches the development server, providing API and UI endpoints. ```bash # Install dependencies and start the LangGraph server uvx --refresh --from "langgraph-cli[inmem]" --with-editable . --python 3.11 langgraph dev --allow-blocking ``` -------------------------------- ### Install Dependencies Source: https://github.com/langchain-ai/open_deep_research/blob/main/README.md Install project dependencies using uv. You can sync all dependencies or install them from the pyproject.toml file. ```bash uv sync # or uv pip install -r pyproject.toml ``` -------------------------------- ### Programmatic Evaluation Setup Source: https://context7.com/langchain-ai/open_deep_research/llms.txt This Python code snippet shows how to set up and run an evaluation programmatically. It defines a configuration dictionary specifying various models for research, summarization, compression, and final reporting, along with the search API to be used. The comments outline the nine quality criteria evaluated. ```python # Programmatic evaluation setup from tests.evaluators import evaluate_report_quality from tests.run_evaluate import run_evaluation # Run evaluation with custom configuration config = { "configurable": { "research_model": "anthropic:claude-sonnet-4-20250514", "summarization_model": "openai:gpt-4.1-mini", "compression_model": "openai:gpt-4.1", "final_report_model": "openai:gpt-4.1", "search_api": "tavily" } } # Evaluation checks 9 quality criteria: # 1. Topic Relevance - Does report address the topic? # 2. Section Relevance - Are all sections relevant? # 3. Structure and Flow - Logical narrative flow? # 4. Introduction Quality - Proper context and scope? # 5. Conclusion Quality - Summarizes key findings? # 6. Structural Elements - Tables, lists usage? # 7. Section Headers - Correct Markdown formatting? # 8. Citations - Proper source citation? # 9. Overall Quality - Well-researched and accurate? ``` -------------------------------- ### LangGraph Server Deployment Command Source: https://context7.com/langchain-ai/open_deep_research/llms.txt Command to install dependencies and start the LangGraph server locally for the deep researcher project. It specifies Python version and uses editable install. ```bash # Install dependencies and start the LangGraph server uvx --refresh --from "langchain-cli[inmem]" --with-editable . --python 3.11 langgraph dev --allow-blocking # Server endpoints: # - API: http://127.0.0.1:2024 # - Studio UI: https://smith.langchain.com/studio/?baseUrl=http://127.0.0.1:2024 # - API Docs: http://127.0.0.1:2024/docs ``` -------------------------------- ### Environment Configuration for API Keys Source: https://context7.com/langchain-ai/open_deep_research/llms.txt This section outlines the necessary environment variables for configuring API keys and LangSmith tracing. It lists common keys for OpenAI, Anthropic, Google, and Tavily, as well as LangSmith specific variables for API key, project name, and tracing enablement. This setup is crucial for running evaluations and utilizing various services. ```bash # .env file configuration OPENAI_API_KEY=sk-... ANTHROPIC_API_KEY=sk-ant-... GOOGLE_API_KEY=... TAVILY_API_KEY=tvly-... # LangSmith tracing (optional but recommended) LANGSMITH_API_KEY=ls-... LANGSMITH_PROJECT=open-deep-research LANGSMITH_TRACING=true ``` -------------------------------- ### Clone Repository and Activate Virtual Environment Source: https://github.com/langchain-ai/open_deep_research/blob/main/README.md Clone the repository and set up a virtual environment for the project. Ensure you are using the correct activation command for your operating system. ```bash git clone https://github.com/langchain-ai/open_deep_research.git cd open_deep_research uv venv source .venv/bin/activate # On Windows: .venv\Scripts\activate ``` -------------------------------- ### Load Configuration from Environment Source: https://context7.com/langchain-ai/open_deep_research/llms.txt Loads configuration from environment variables using dotenv and initializes the Configuration object. Overrides can be applied using runtime configurations. ```python # Load configuration from environment import os from dotenv import load_dotenv from open_deep_research.configuration import Configuration load_dotenv() # Environment variables are automatically loaded config = Configuration.from_runnable_config({}) print(f"Using model: {config.research_model}") # Override with runtime config runtime_config = Configuration.from_runnable_config({ "configurable": { "research_model": "anthropic:claude-sonnet-4-20250514", "search_api": "anthropic" # Use Anthropic's native web search } }) ``` -------------------------------- ### AgentInputState Initialization Source: https://context7.com/langchain-ai/open_deep_research/llms.txt Initializes the AgentInputState with a HumanMessage, representing the entry point for user input in the research workflow. ```python from langchain_core.messages import HumanMessage, AIMessage # AgentInputState - Entry point for user messages input_state = { "messages": [HumanMessage(content="Research the impact of AI on healthcare")] } ``` -------------------------------- ### Run Deep Research Bench Evaluation Source: https://context7.com/langchain-ai/open_deep_research/llms.txt This section provides bash commands to set up environment variables and run the Deep Research Bench evaluation. It includes commands for setting API keys for LangSmith, OpenAI, and Tavily, and then executing the evaluation script. Remember to replace placeholders with your actual keys and experiment names. ```bash # Set up environment variables export LANGSMITH_API_KEY="your-langsmith-key" export LANGSMITH_PROJECT="deep-research-eval" export OPENAI_API_KEY="your-openai-key" export TAVILY_API_KEY="your-tavily-key" # Run evaluation (costs ~$20-$100 depending on model) python tests/run_evaluate.py # Extract results for submission python tests/extract_langsmith_data.py \ --project-name "YOUR_EXPERIMENT_NAME" \ --model-name "gpt-4.1" \ --dataset-name "deep_research_bench" # Output: tests/expt_results/deep_research_bench_gpt-4.1.jsonl ``` -------------------------------- ### Run Comprehensive Evaluation Source: https://github.com/langchain-ai/open_deep_research/blob/main/README.md Execute a full evaluation of the project on LangSmith datasets. This command initiates the evaluation process. ```bash python tests/run_evaluate.py ``` -------------------------------- ### Configure MCP Tools Integration Source: https://context7.com/langchain-ai/open_deep_research/llms.txt This Python code configures and loads Model Context Protocol (MCP) tools to extend research capabilities. It sets up MCP server configuration, including authentication and available tools, and then loads these tools into the existing toolset. Ensure your MCP server is accessible and configured correctly. ```python from open_deep_research.utils import load_mcp_tools from open_deep_research.configuration import MCPConfig # Configure MCP server with authentication mcp_config = MCPConfig( url="https://your-mcp-server.com", tools=["document_search", "database_query", "calendar_check"], auth_required=True ) # Runtime configuration with MCP config = { "configurable": { "mcp_config": mcp_config.model_dump(), "mcp_prompt": """ Use document_search for internal company documents. Use database_query for structured data lookups. Use calendar_check for scheduling information. """ }, "metadata": { "owner": "user-123" } } # Load MCP tools (called internally by get_all_tools) async def load_tools_example(): existing_tools = {"tavily_search", "think_tool", "ResearchComplete"} mcp_tools = await load_mcp_tools(config, existing_tools) for tool in mcp_tools: print(f"Loaded MCP tool: {tool.name}") print(f"Description: {tool.description}") import asyncio asyncio.run(load_tools_example()) ``` -------------------------------- ### Configuration Class for Research Agent Source: https://context7.com/langchain-ai/open_deep_research/llms.txt Defines all configurable parameters for the research agent, including model selection, search API, and MCP settings. Use `Configuration.from_runnable_config` to load configuration from a dictionary. ```python from open_deep_research.configuration import Configuration, SearchAPI, MCPConfig from langchain_core.runnables import RunnableConfig # Create configuration from environment variables and runtime config runtime_config: RunnableConfig = { "configurable": { # Model Configuration "summarization_model": "openai:gpt-4.1-mini", # Summarizes search results "summarization_model_max_tokens": 8192, "research_model": "openai:gpt-4.1", # Powers the search agent "research_model_max_tokens": 10000, "compression_model": "openai:gpt-4.1", # Compresses research findings "compression_model_max_tokens": 8192, "final_report_model": "openai:gpt-4.1", # Writes final report "final_report_model_max_tokens": 10000, # Search Configuration "search_api": "tavily", # Options: "tavily", "anthropic", "openai", "none" "max_content_length": 50000, # Research Limits "max_researcher_iterations": 6, "max_react_tool_calls": 10, "max_concurrent_research_units": 5, "max_structured_output_retries": 3, # User Interaction "allow_clarification": True, # MCP Configuration (optional) "mcp_config": { "url": "https://your-mcp-server.com", "tools": ["custom_search", "database_query"], "auth_required": True }, "mcp_prompt": "Use custom_search for internal documents" } } # Load configuration config = Configuration.from_runnable_config(runtime_config) print(f"Research model: {config.research_model}") print(f"Search API: {config.search_api.value}") ``` -------------------------------- ### Direct Search with Summarization Source: https://context7.com/langchain-ai/open_deep_research/llms.txt Executes a search with multiple queries and uses a summarization model to process results. Configuration options for summarization and content length are provided. ```python async def search_example(): config = { "configurable": { "summarization_model": "openai:gpt-4.1-mini", "summarization_model_max_tokens": 8192, "max_content_length": 50000, "max_structured_output_retries": 3 } } # Execute search with multiple queries results = await tavily_search.ainvoke( { "queries": [ "latest developments in AI safety research 2024", "OpenAI AI alignment approaches", "Anthropic constitutional AI methodology" ] }, config ) print(results) # Output format: # Search results: # --- SOURCE 1: Article Title --- # URL: https://example.com/article # SUMMARY: # Main findings... # Important quotes... ``` ```python asyncio.run(search_example()) ``` -------------------------------- ### SupervisorState Initialization Source: https://context7.com/langchain-ai/open_deep_research/llms.txt Initializes the SupervisorState, which tracks the supervisor's research management, including messages, research brief, notes, and iteration counts. ```python from open_deep_research.state import ( AgentState, AgentInputState, SupervisorState, ResearcherState, ConductResearch, ResearchComplete, ClarifyWithUser, ResearchQuestion ) # SupervisorState - Tracks supervisor's research management supervisor_state: SupervisorState = { "supervisor_messages": [], "research_brief": "Comprehensive analysis of AI applications in healthcare...", "notes": [], "research_iterations": 0, "raw_notes": [] } ``` -------------------------------- ### Structured Output for ClarifyWithUser Source: https://context7.com/langchain-ai/open_deep_research/llms.txt Defines a structured output for ClarifyWithUser, indicating the need for clarification and posing a specific question to the user regarding the healthcare domain of interest. ```python clarify = ClarifyWithUser( need_clarification=True, question="Are you interested in a specific healthcare domain (e.g., diagnostics, drug discovery, patient care)?", verification="" ) ``` -------------------------------- ### Run Deep Researcher with a Question Source: https://context7.com/langchain-ai/open_deep_research/llms.txt This asynchronous function orchestrates the complete research workflow from user input to final report generation. It configures the research parameters and invokes the deep_researcher graph. ```python from langgraph.graph import StateGraph from open_deep_research.deep_researcher import deep_researcher from open_deep_research.configuration import Configuration from langchain_core.messages import HumanMessage # Run the deep researcher with a research question async def run_research(question: str): config = { "configurable": { "research_model": "openai:gpt-4.1", "summarization_model": "openai:gpt-4.1-mini", "compression_model": "openai:gpt-4.1", "final_report_model": "openai:gpt-4.1", "search_api": "tavily", "max_researcher_iterations": 6, "max_concurrent_research_units": 5, "allow_clarification": True } } result = await deep_researcher.ainvoke( {"messages": [HumanMessage(content=question)]}, config ) return result["final_report"] # Example usage import asyncio report = asyncio.run(run_research("Compare the approaches of OpenAI and Anthropic to AI safety")) print(report) ``` -------------------------------- ### Structured Output for ResearchComplete Source: https://context7.com/langchain-ai/open_deep_research/llms.txt Represents a structured output indicating that the research process is complete. ```python research_complete = ResearchComplete() ``` -------------------------------- ### Think Tool for Strategic Reflection Source: https://context7.com/langchain-ai/open_deep_research/llms.txt Utilizes the think_tool for strategic reflection during research. It takes a detailed reflection string as input, analyzing findings, assessing gaps, and making strategic decisions. ```python from open_deep_research.utils import think_tool # The think_tool is used internally by researchers to reflect on progress # Example of how it's called during research: reflection = think_tool.invoke({ "reflection": """ Analysis of current findings: - Found 3 relevant sources on AI safety approaches - OpenAI focuses on RLHF and Constitutional AI - Anthropic emphasizes interpretability research Gap assessment: - Missing: concrete examples of safety failures - Missing: comparison of resource allocation Strategic decision: - Need one more search for safety incident examples - Then have sufficient information to provide comprehensive answer """ }) print(reflection) # Output: "Reflection recorded: Analysis of current findings..." ``` -------------------------------- ### ResearcherState Initialization Source: https://context7.com/langchain-ai/open_deep_research/llms.txt Initializes the ResearcherState for an individual researcher, including their messages, tool call iterations, research topic, and notes. ```python # ResearcherState - Individual researcher's working state researcher_state: ResearcherState = { "researcher_messages": [HumanMessage(content="AI diagnostic tools in radiology")], "tool_call_iterations": 0, "research_topic": "AI diagnostic tools in radiology", "compressed_research": "", "raw_notes": [] } ``` -------------------------------- ### Call Deep Research API Source: https://context7.com/langchain-ai/open_deep_research/llms.txt This Python script demonstrates how to programmatically interact with a deployed deep research API. It creates a thread, runs a deep researcher with specific inputs and configurations, polls for completion, and retrieves the final report. Ensure the API server is running at http://127.0.0.1:2024. ```python import httpx import asyncio async def call_research_api(): async with httpx.AsyncClient(timeout=300.0) as client: # Create a new thread response = await client.post( "http://127.0.0.1:2024/threads", json={} ) thread_id = response.json()["thread_id"] # Run the deep researcher response = await client.post( f"http://127.0.0.1:2024/threads/{thread_id}/runs", json={ "assistant_id": "Deep Researcher", "input": { "messages": [{ "role": "human", "content": "What are the latest breakthroughs in fusion energy?" }] }, "config": { "configurable": { "research_model": "openai:gpt-4.1", "search_api": "tavily" } } } ) run_id = response.json()["run_id"] # Poll for completion while True: status = await client.get( f"http://127.0.0.1:2024/threads/{thread_id}/runs/{run_id}" ) if status.json()["status"] == "success": break await asyncio.sleep(2) # Get final state state = await client.get( f"http://127.0.0.1:2024/threads/{thread_id}/state" ) return state.json()["values"]["final_report"] report = asyncio.run(call_research_api()) print(report) ``` -------------------------------- ### Structured Output for ConductResearch Source: https://context7.com/langchain-ai/open_deep_research/llms.txt Defines a structured output for the ConductResearch tool, specifying the research topic with detailed requirements for accuracy, approvals, and study focus. ```python # Structured outputs for tool calls conduct_research = ConductResearch( research_topic="""Research the current state of AI-powered diagnostic tools in radiology, including accuracy rates, FDA approvals, and adoption in clinical settings. Focus on peer-reviewed studies from 2022-2024.""" ) ``` -------------------------------- ### Tavily Search Tool Source: https://context7.com/langchain-ai/open_deep_research/llms.txt Executes web searches using Tavily API with automatic content summarization for research tasks. Supports both synchronous and asynchronous operations. ```python from open_deep_research.utils import tavily_search, tavily_search_async import asyncio ``` -------------------------------- ### Extract LangSmith Data for Deep Research Bench Source: https://github.com/langchain-ai/open_deep_research/blob/main/README.md Extract evaluation results from LangSmith into a JSONL file format required by the Deep Research Bench. Specify your project name, model name, and dataset name. ```bash python tests/extract_langsmith_data.py --project-name "YOUR_EXPERIMENT_NAME" --model-name "you-model-name" --dataset-name "deep_research_bench" ``` -------------------------------- ### Raw Tavily Search without Summarization Source: https://context7.com/langchain-ai/open_deep_research/llms.txt Performs a raw search using Tavily, allowing for raw content inclusion and specifying search parameters like max results and topic. Results are printed with title, URL, and truncated content. ```python async def raw_search_example(): results = await tavily_search_async( search_queries=["quantum computing breakthroughs 2024"], max_results=5, topic="general", # Options: "general", "news", "finance" include_raw_content=True, config={"configurable": {}} ) for response in results: for result in response['results']: print(f"Title: {result['title']}") print(f"URL: {result['url']}") print(f"Content: {result['content'][:200]}...") ``` === COMPLETE CONTENT === This response contains all available snippets from this library. No additional content exists. Do not make further requests.