### Install Youtu-GraphRAG with Docker
Source: https://github.com/tencentcloudadp/youtu-graphrag/blob/main/FULLGUIDE.md
This snippet outlines the steps to clone the Youtu-GraphRAG project, set up the environment variables, build a Docker image, and run the container for a quick web experience. It assumes Docker is already installed.
```bash
git clone https://github.com/TencentCloudADP/youtu-graphrag
cd youtu-graphrag && cp .env.example .env
# Config your LLM api in .env as OpenAI API format
# LLM_MODEL=deepseek-chat
# LLM_BASE_URL=https://api.deepseek.com
# LLM_API_KEY=sk-xxxxxx
docker build -t youtu_graphrag:v1 .
docker run -d -p 8000:8000 youtu_graphrag:v1
curl -v http://localhost:8000
```
--------------------------------
### Source Code Quick Start for Youtu-GraphRAG
Source: https://github.com/tencentcloudadp/youtu-graphrag/blob/main/README-CN.md
This snippet outlines the steps to set up Youtu-GraphRAG using Python and pip. It includes cloning the repository, creating and configuring the .env file, setting up a virtual environment, running a setup script, starting the service, and verifying its operation.
```bash
git clone https://github.com/TencentCloudADP/youtu-graphrag
cd youtu-graphrag && touch .env
# 按照如下格式在.env 中配置 OpenAI API 格式的 LLM API
# LLM_MODEL=deepseek-chat
# LLM_BASE_URL=https://api.deepseek.com
# LLM_API_KEY=sk-xxxxxx
python -m venv venv
source venv/bin/activate # Linux/macOS
./setup_env.sh
./start.sh
curl -v http://localhost:8000 # 检测服务是否正常运行
```
--------------------------------
### Run Youtu-GraphRAG using Conda Environment
Source: https://github.com/tencentcloudadp/youtu-graphrag/blob/main/FULLGUIDE.md
This section details the process of setting up the Youtu-GraphRAG environment using Conda. It includes cloning the project, configuring the .env file, creating and activating a Conda environment, and executing the setup script.
```bash
git clone https://github.com/TencentCloudADP/youtu-graphrag
cd youtu-graphrag && cp .env.example .env
# Config your LLM api in .env as OpenAI API format
LLM_MODEL=deepseek-chat
LLM_BASE_URL=https://api.deepseek.com
LLM_API_KEY=sk-xxxxxx
# Create the conda environment.
conda create -n YouTuGraphRAG python=3.10
conda activate YouTuGraphRAG
# Setup environment
# You can also use the bash ./setup_env.sh to do the same thing.
chmod +x setup_env.sh
./setup_env.sh
```
--------------------------------
### Docker Quick Start for Youtu-GraphRAG
Source: https://github.com/tencentcloudadp/youtu-graphrag/blob/main/README-CN.md
This snippet provides the bash commands to clone the Youtu-GraphRAG repository, configure the environment variables by copying an example file, build a Docker image, run the Docker container, and test the service with curl.
```bash
git clone https://github.com/TencentCloudADP/youtu-graphrag
cd youtu-graphrag && cp .env.example .env
# 按照如下格式在 .env 中配置兼容 OpenAI API 格式的 LLM API
# LLM_MODEL=deepseek-chat
# LLM_BASE_URL=https://api.deepseek.com
# LLM_API_KEY=sk-xxxxxx
docker build -t youtu_graphrag:v1 .
docker run -d -p 8000:8000 youtu_graphrag:v1
curl -v http://localhost:8000
```
--------------------------------
### Web UI Experience Setup and Launch
Source: https://github.com/tencentcloudadp/youtu-graphrag/blob/main/README.md
Steps to set up the environment for the web UI, including cloning the repository, configuring environment variables, setting up dependencies using setup_env.sh, and launching the web service with start.sh.
```bash
# 1. Clone Youtu-GraphRAG project
git clone https://github.com/TencentCloudADP/youtu-graphrag
# 2. Create .env according to .env.example
cd youtu-graphrag && cp .env.example .env
# Config your LLM api in .env as OpenAI API format
# LLM_MODEL=deepseek-chat
# LLM_BASE_URL=https://api.deepseek.com
# LLM_API_KEY=sk-xxxxxx
# 3. Setup environment
./setup_env.sh
# 4. Launch the web
./start.sh
# 5. Visit http://localhost:8000
curl -v http://localhost:8000
```
--------------------------------
### Specialized Youtu-GraphRAG Functions via Command Line
Source: https://github.com/tencentcloudadp/youtu-graphrag/blob/main/FULLGUIDE.md
Illustrates how to execute specific functionalities of Youtu-GraphRAG from the command line, such as building the knowledge graph only, executing retrieval only, and applying performance optimization configurations.
```bash
# 1. Build knowledge graph only
python main.py --override '{"triggers": {"constructor_trigger": true, "retrieve_trigger": false}}' --datasets demo
# 2. Execute retrieval only (skip construction)
python main.py --override '{"triggers": {"constructor_trigger": false, "retrieve_trigger": true}}' --datasets demo
# 3. Performance optimization configuration
python main.py --override '{"construction": {"max_workers": 64}, "embeddings": {"batch_size": 64}}' --datasets demo
```
--------------------------------
### Basic Command Line Usage for Youtu-GraphRAG
Source: https://github.com/tencentcloudadp/youtu-graphrag/blob/main/FULLGUIDE.md
Demonstrates fundamental command-line operations for Youtu-GraphRAG, including running with default configurations, specifying multiple datasets, using a custom configuration file, and overriding runtime parameters.
```bash
# 1. Run with default configuration
python main.py --datasets demo
# 2. Specify multiple datasets
python main.py --datasets hotpot 2wiki musique
# 3. Use custom configuration file
python main.py --config my_config.yaml --datasets demo
# 4. Runtime parameter override
python main.py --override '{"retrieval": {"top_k_filter": 50}, "triggers": {"mode": "noagent"}}' --datasets demo
```
--------------------------------
### Configure LLM Parameters
Source: https://github.com/tencentcloudadp/youtu-graphrag/blob/main/FULLGUIDE.md
Sets the LLM model to 'gpt-3.5-turbo', temperature to 0.7, and maximum tokens to 1500. These parameters control the behavior and output of the language model.
```python
python main.py --override '{
"llm": {
"model": "gpt-3.5-turbo",
"temperature": 0.7,
"max_tokens": 1500
}
}' --datasets demo
```
--------------------------------
### Configure Embedding Model
Source: https://github.com/tencentcloudadp/youtu-graphrag/blob/main/FULLGUIDE.md
Specifies the embedding model as 'sentence-transformers/all-MiniLM-L6-v2', sets the batch size to 16, and the device to 'cpu'. This configuration determines how text data is converted into numerical embeddings.
```python
python main.py --override '{
"embeddings": {
"model_name": "sentence-transformers/all-MiniLM-L6-v2",
"batch_size": 16,
"device": "cpu"
}
}' --datasets demo
```
--------------------------------
### JavaScript Event Listener Setup
Source: https://github.com/tencentcloudadp/youtu-graphrag/blob/main/frontend/index.html
Configures event listeners for user interactions, including file uploads via click and drag-and-drop, and keyboard shortcuts for submitting questions. It also handles tab switching for different application sections.
```javascript
function setupEventListeners() {
// File upload
const uploadArea = document.getElementById('uploadArea');
const fileInput = document.getElementById('fileInput');
uploadArea.addEventListener('click', () => fileInput.click());
uploadArea.addEventListener('dragover', handleDragOver);
uploadArea.addEventListener('dragleave', handleDragLeave);
uploadArea.addEventListener('drop', handleDrop);
fileInput.addEventListener('change', handleFileSelect);
// Question input
document.getElementById('questionInput').addEventListener('keydown', function(e) {
if (e.ctrlKey && e.key === 'Enter') {
askQuestion();
}
});
}
function switchTab(tabName) {
// Update tab buttons
document.querySelectorAll('.tab').forEach(tab => tab.classList.remove('active'));
event.target.classList.add('active');
// Update tab content
document.querySelectorAll('.tab-content').forEach(content => content.classList.remove('active'));
document.getElementById(tabName).classList.add('active');
// Load data for specific tabs
if (tabName === 'upload') {
loadDatasets();
} else if (tabName === 'graph') {
loadDatasetOptions('graphDataset');
} else if (tabName === 'qa') {
loadDatasetOptions('qaDataset');
}
}
```
--------------------------------
### CPU Performance Optimization
Source: https://github.com/tencentcloudadp/youtu-graphrag/blob/main/FULLGUIDE.md
Optimizes for CPU environments by setting construction max workers to 4 and embedding batch size to 8, with the device set to 'cpu'. This configuration is suitable for systems with limited GPU resources.
```bash
python main.py --override '{
"construction": {"max_workers": 4},
"embeddings": {"batch_size": 8, "device": "cpu"}
}' --datasets demo
```
--------------------------------
### Configure Construction Settings
Source: https://github.com/tencentcloudadp/youtu-graphrag/blob/main/FULLGUIDE.md
Sets the maximum workers for construction to 32, chunk size to 512, and overlap size to 50. These parameters are crucial for building the graph structure from the data.
```python
python main.py --override '{
"construction": {
"max_workers": 32,
"chunk_size": 512,
"overlap_size": 50
}
}' --datasets demo
```
--------------------------------
### Configure Retrieval Settings
Source: https://github.com/tencentcloudadp/youtu-graphrag/blob/main/FULLGUIDE.md
Sets the top K filter for retrieval to 30, a chunk similarity threshold of 0.7, and a batch size of 32. This configuration impacts how relevant chunks are selected during the retrieval phase.
```python
python main.py --override '{
"retrieval": {
"top_k_filter": 30,
"chunk_similarity_threshold": 0.7,
"batch_size": 32
}
}' --datasets demo
```
--------------------------------
### Memory Optimization for Low Memory Environments
Source: https://github.com/tencentcloudadp/youtu-graphrag/blob/main/FULLGUIDE.md
Configures for low memory environments by setting construction max workers to 2, embedding batch size to 4, and retrieval top K filter to 10. This approach minimizes memory usage.
```bash
python kt_rag.py --override '{
"construction": {"max_workers": 2},
"embeddings": {"batch_size": 4},
"retrieval": {"top_k_filter": 10}
}' --datasets demo
```
--------------------------------
### GPU Performance Optimization
Source: https://github.com/tencentcloudadp/youtu-graphrag/blob/main/FULLGUIDE.md
Optimizes for GPU environments by setting construction max workers to 16 and embedding batch size to 64, with the device set to 'cuda'. This configuration leverages GPU acceleration for faster processing.
```bash
python main.py --override '{
"construction": {"max_workers": 16},
"embeddings": {"batch_size": 64, "device": "cuda"}
}' --datasets demo
```
--------------------------------
### Clean and Extract Entity Information (JavaScript)
Source: https://github.com/tencentcloudadp/youtu-graphrag/blob/main/frontend/index.html
Provides utility functions for processing entity strings. `extractSchemaType` extracts a schema type from a string using a regex pattern, while `cleanEntityName` removes this pattern to get a cleaner name. These are useful for data normalization.
```javascript
function extractSchemaType(entityStr) {
const match = entityStr.match(/\[schema\_type:\s*([^\]]+)\]/);
return match ? match[1].trim() : null;
}
```
```javascript
function cleanEntityName(entityStr) {
return entityStr.replace(/\s*\[schema\_type:[^\]]+\]/g, '').trim();
}
```
--------------------------------
### Build and Run with Docker
Source: https://github.com/tencentcloudadp/youtu-graphrag/blob/main/README.md
Instructions for cloning the repository, configuring environment variables for LLM API, building the Docker image, and running the Docker container to serve the application.
```bash
# 1. Clone Youtu-GraphRAG project
git clone https://github.com/TencentCloudADP/youtu-graphrag
# 2. Create .env according to .env.example
cd youtu-graphrag && cp .env.example .env
# Config your LLM api in .env as OpenAI API format
# LLM_MODEL=deepseek-chat
# LLM_BASE_URL=https://api.deepseek.com
# LLM_API_KEY=sk-xxxxxx
# 3. Build with dockerfile
docker build -t youtu_graphrag:v1 .
# 4. Docker run
docker run -d -p 8000:8000 youtu_graphrag:v1
# 5. Visit http://localhost:8000
curl -v http://localhost:8000
```
--------------------------------
### Ask a Question and Display Answer (JavaScript)
Source: https://github.com/tencentcloudadp/youtu-graphrag/blob/main/frontend/index.html
Handles the process of asking a question via a web interface. It retrieves user input, makes a POST request to an API endpoint (`/api/ask-question`) using Axios, and then displays the answer and retrieval details. Includes loading state management and error handling.
```javascript
async function askQuestion() { const datasetName = document.getElementById('qaDataset').value; const question = document.getElementById('questionInput').value.trim(); if (!datasetName || !question) { showMessage('Please select a dataset and enter a question', 'error'); return; } const loading = document.getElementById('qaLoading'); const answerSection = document.getElementById('answerSection'); const askBtn = document.getElementById('askBtn'); loading.classList.add('show'); answerSection.classList.add('hidden'); askBtn.disabled = true; try { const response = await axios.post(`${API_BASE}/api/ask-question`, { question: question, dataset_name: datasetName, client_id: 'web_client' }); const result = response.data; displayAnswer(result); questionCount++; updateStats(); } catch (error) { console.error('Question failed:', error); showMessage('Failed to process question: ' + (error.response?.data?.detail || error.message), 'error'); } finally { loading.classList.remove('show'); askBtn.disabled = false; } }
```
--------------------------------
### JavaScript Global Variables and Initialization
Source: https://github.com/tencentcloudadp/youtu-graphrag/blob/main/frontend/index.html
Defines global variables for managing application state, including selected files, datasets, and chart instances. It also sets up the initial application state and event listeners upon page load.
```javascript
let selectedFiles = [];
let datasets = [];
let currentDataset = null;
let graphChart = null;
// let queryChart = null;
let questionCount = 0;
// API base URL
const API_BASE = '';
// Initialize the app
document.addEventListener('DOMContentLoaded', function() {
initializeApp();
setupEventListeners();
});
function initializeApp() {
refreshData();
console.log('Youtu-GraphRAG initialized');
}
```
--------------------------------
### Load Dataset Options for Select Element (JavaScript)
Source: https://github.com/tencentcloudadp/youtu-graphrag/blob/main/frontend/index.html
Populates a given HTML select element with options for datasets that have a 'ready' status. It filters the global `datasets` array and creates `' + readyDatasets.map(d => ``).join('');
}
```
--------------------------------
### Youtu-GraphRAG Citation
Source: https://github.com/tencentcloudadp/youtu-graphrag/blob/main/README.md
BibTeX entry for citing the Youtu-GraphRAG project in academic work.
```bibtex
@misc{dong2025youtugraphrag,
title={Youtu-GraphRAG: Vertically Unified Agents for Graph Retrieval-Augmented Complex Reasoning},
author={Junnan Dong and Siyu An and Yifei Yu and Qian-Wen Zhang and Linhao Luo and Xiao Huang and Yunsheng Wu and Di Yin and Xing Sun},
year={2025},
eprint={2508.19855},
archivePrefix={arXiv},
url={https://arxiv.org/abs/2508.19855},
}
```
--------------------------------
### Initialize and Set Graph Chart Options (JavaScript)
Source: https://github.com/tencentcloudadp/youtu-graphrag/blob/main/frontend/index.html
Configures and sets options for a graph chart, likely using ECharts. It defines graph layout, data sources (nodes, links, categories), and label formatting. Dependencies include the ECharts library and a global `graphChart` object.
```javascript
function renderGraph() { const option = { dataset: { source: data.categories.map(function(c){return c.name;}) } || [], series: [{ type: 'graph', layout: 'force', data: data.nodes || [], links: data.links || [], categories: data.categories || [], roam: true, label: { show: true, color: 'rgba(255, 255, 255, 0.9)', formatter: function(p){ const d = p.data || {}; let name = (d.name || '').toString().replace(/\s+/g,' ').trim(); if (name.length > 20) name = name.slice(0,20) + '...'; return name || ''; } }, force: { repulsion: 1000, gravity: 0.1, edgeLength: 120 }, lineStyle: { opacity: 0.6, color: 'rgba(255, 255, 255, 0.4)' } }] }; graphChart.setOption(option); }
```
--------------------------------
### Display Datasets and Manage UI Elements (JavaScript)
Source: https://github.com/tencentcloudadp/youtu-graphrag/blob/main/frontend/index.html
This function fetches and displays a list of datasets in the UI, handling cases where no datasets are available. It dynamically generates HTML elements for each dataset, including status badges and action buttons (Construct, Reconstruct, Upload Schema, Delete). It also manages the visual representation of custom versus default schemas.
```javascript
function displayDatasets() {
const container = document.getElementById('datasetsList');
if (datasets.length === 0) {
container.innerHTML = '
No datasets available. Upload some files to get started.
`).join('');
}
```
--------------------------------
### ECharts Graph Rendering Configuration (JavaScript)
Source: https://github.com/tencentcloudadp/youtu-graphrag/blob/main/frontend/index.html
Configures and initializes an ECharts graph instance. It sets up the chart's appearance, tooltip behavior for nodes and edges, and legend options. It also handles disposing of the previous chart instance to prevent memory leaks.
```javascript
function renderGraph(data) {
const chartContainer = document.getElementById('graphChart');
if (graphChart) {
graphChart.dispose();
}
graphChart = echarts.init(chartContainer);
const option = {
backgroundColor: 'transparent',
// title: {
// text: 'Knowledge Graph',
// left: 'center',
// textStyle: { color: 'rgba(255, 255, 255, 0.95)' }
// },
tooltip: {
trigger: 'item',
backgroundColor: 'rgba(0, 0, 0, 0.8)',
textStyle: { color: '#ffffff' },
formatter: function (params) {
if (params.dataType === 'node') {
const d = params.data || {};
let name = (d.name || '').toString().replace(/\s+/g,' ').trim();
if (name.length > 20) name = name.slice(0,20) + '...';
const category = d.category || d.type || '';
return name ? name : category || 'node';
} else if (params.dataType === 'edge') {
return params.data && params.data.name ? params.data.name : '';
}
return '';
}
},
legend: {
type: 'scroll',
bottom: 10,
textStyle: { color: 'rgba(255, 255, 255, 0.85)' },
data: (data.categories &
```
--------------------------------
### Initiate Graph Construction with WebSocket Updates (JavaScript)
Source: https://github.com/tencentcloudadp/youtu-graphrag/blob/main/frontend/index.html
This function initiates the graph construction process for a given dataset. It updates the dataset's status to 'constructing' immediately and establishes a WebSocket connection to receive real-time progress updates and completion/error notifications. It sends the construction request via Axios and includes a timeout for WebSocket communication.
```javascript
async function constructGraph(datasetName) {
// 前端立即将该数据集状态设为 constructing
if (datasets && Array.isArray(datasets)) {
for (let ds of datasets) {
if (ds.name === datasetName) {
ds.status = 'constructing';
}
}
displayDatasets();
}
try {
showMessage('Starting graph construction...', 'info');
// 建立WebSocket连接来接收实时进展
const wsProto = window.location.protocol === 'https:' ? 'wss' : 'ws';
const ws = new WebSocket(`${wsProto}://${window.location.host}/ws/web_client`);
let progressMessages = [];
ws.onopen = function() {
console.log('WebSocket connected for progress updates');
};
ws.onmessage = function(event) {
try {
const data = JSON.parse(event.data);
if (data.type === 'progress') {
progressMessages.push(data.message);
showMessage(`[Construct ${progressMessages.length}] ${data.message}`, 'info');
} else if (data.type === 'complete') {
showMessage('Graph construction completed!', 'success');
// refresh datasets immediately to reflect ready status
refreshData();
ws.close();
} else if (data.type === 'error') {
showMessage(`Construction error: ${data.message}`, 'error');
refreshData();
ws.close();
}
} catch (e) {
console.log('Progress update:', event.data);
}
};
ws.onerror = function(error) {
console.log('WebSocket error:', error);
};
ws.onclose = function() {
console.log('WebSocket connection closed');
};
// Send construct request
const response = await axios.post(`${API_BASE}/api/construct-graph`, {
dataset_name: datasetName
}, {
params: {
client_id: 'web_client'
}
});
// If no WebSocket messages, close after timeout
setTimeout(() => {
if (ws.readyState === WebSocket.OPEN) {
ws.close();
if (progressMessages.length === 0) {
showMessage('Graph construction completed!', 'success');
}
}
}, 30000); // 30s timeout
// 不再立即 refreshData,等 WebSocket complete/error 后再刷新
} catch (error) {
console.error('Construction failed:', error);
showMessage('Graph construction failed: ' + (error.response?.data?.detail || error.message), 'error');
}
}
```
--------------------------------
### JavaScript File Handling: Selection and Updates
Source: https://github.com/tencentcloudadp/youtu-graphrag/blob/main/frontend/index.html
Manages file selection via input elements and updates the displayed file list. It includes functions to add new files, update the UI to show selected files with options to remove them, and clear the selected files.
```javascript
function handleFileSelect(e) {
const files = Array.from(e.target.files);
addFiles(files);
}
function addFiles(files) {
selectedFiles = [...selectedFiles, ...files];
updateFileList();
document.getElementById('uploadBtn').disabled = selectedFiles.length === 0;
}
function updateFileList() {
const fileList = document.getElementById('fileList');
if (selectedFiles.length === 0) {
fileList.classList.add('hidden');
return;
}
fileList.classList.remove('hidden');
fileList.innerHTML = selectedFiles.map((file, index) =>
`
📄 ${file.name}${formatFileSize(file.size)}
`
).join('');
}
function removeFile(index) {
selectedFiles.splice(index, 1);
updateFileList();
document.getElementById('uploadBtn').disabled = selectedFiles.length === 0;
}
function clearFiles() {
selectedFiles = [];
document.getElementById('fileInput').value = '';
updateFileList();
document.getElementById('uploadBtn').disabled = true;
}
```
--------------------------------
### Initialize and Render ECharts Graph (JavaScript)
Source: https://github.com/tencentcloudadp/youtu-graphrag/blob/main/frontend/index.html
Initializes an ECharts instance on a given chart container and sets its options to render a graph. It also handles disposing of the previous chart instance if one exists. This function is responsible for the visual display of query decomposition.
```javascript
function renderQueryChart(data) {
const chartContainer = document.getElementById('queryChart');
if (queryChart) {
queryChart.dispose();
}
queryChart = echarts.init(chartContainer);
const option = { /* ... chart options ... */ };
queryChart.setOption(option);
}
```
--------------------------------
### Display Retrieval Details (JavaScript)
Source: https://github.com/tencentcloudadp/youtu-graphrag/blob/main/frontend/index.html
Formats and displays detailed statistics about the question-answering process, including the number of sub-questions, total retrieved triples, and relevant chunks. It also shows decomposed sub-questions with their respective triples, chunks, and processing times. Includes a helper function `dedupTriples` to remove duplicate triples.
```javascript
function displayRetrievalDetails(result) { console.log('displayRetrievalDetails called with:', result); // Helper: deduplicate triples array of strings '(s, r, o)' retaining order function dedupTriples(arr) { if (!Array.isArray(arr)) return []; const seen = new Set(); const out = []; for (const t of arr) { if (typeof t !== 'string') continue; const m = t.match(/\(([^,]+),\s*([^,]+),\s*([^\)]+)\)/); if (!m) continue; const key = m.slice(1).map(x => x.trim().toLowerCase()).join('|'); if (!seen.has(key)) { seen.add(key); out.push(`(${m[1].trim()}, ${m[2].trim()}, ${m[3].trim()})`); } } return out; } const detailsContainer = document.getElementById('retrievalDetails'); if (!detailsContainer) { console.error('Retrieval details container not found'); return; } console.log('Found retrieval details container, showing it'); // Show the container detailsContainer.style.display = 'block'; // Compute total retrieved triples from visible sub-question steps (sum of each step's deduped triples list) let subQuestionTriplesTotal = 0; if (result.sub_questions && result.sub_questions.length > 0 && Array.isArray(result.reasoning_steps)) { for (let i = 0; i < result.sub_questions.length; i++) { const step = result.reasoning_steps[i]; if (step && Array.isArray(step.triples)) { subQuestionTriplesTotal += dedupTriples(step.triples).length; } else if (step && typeof step.triples_count === 'number') { // fallback if only count exists subQuestionTriplesTotal += step.triples_count; } } } let detailsHtml = `