Try Live
Add Docs
Rankings
Pricing
Enterprise
Docs
Install
Theme
Install
Docs
Pricing
Enterprise
More...
More...
Try Live
Rankings
Create API Key
Add Docs
DiffBIR
https://github.com/xpixelgroup/diffbir
Admin
DiffBIR is a blind image restoration pipeline that uses a generative diffusion prior to handle
...
Tokens:
8,921
Snippets:
67
Trust Score:
8
Update:
3 months ago
Context
Skills
Chat
Benchmark
74
Suggestions
Latest
Show doc for...
Code
Info
Show Results
Context Summary (auto-generated)
Raw
Copy
Link
# DiffBIR: Blind Image Restoration with Generative Diffusion Prior DiffBIR is a powerful blind image restoration framework that leverages generative diffusion priors from Stable Diffusion to restore degraded images. It supports three main tasks: Blind Super-Resolution (BSR), Blind Face Restoration (BFR), and Blind Image Denoising (BID). The framework uses a two-stage pipeline where stage-1 removes degradations using a cleaner model (SwinIR, BSRNet, or SCUNet), and stage-2 applies a ControlNet-based diffusion model to generate high-quality, realistic details. The project offers three model versions (v1, v2, v2.1) with v2.1 being the latest, featuring improved image captioning via LLaVA, better samplers, and enhanced tiled-sampling support for processing high-resolution images on limited VRAM. DiffBIR automatically downloads pretrained weights and provides both command-line inference and a Gradio web interface for interactive use. ## Installation Install DiffBIR by cloning the repository and setting up the conda environment. ```bash # Clone the repository git clone https://github.com/XPixelGroup/DiffBIR.git cd DiffBIR # Create and activate conda environment conda create -n diffbir python=3.10 conda activate diffbir pip install -r requirements.txt ``` ## Blind Image Super-Resolution (BSR) Upscale low-resolution images while restoring details and removing degradation artifacts using the BSR inference loop. ```bash # DiffBIR v2.1 (recommended) - uses LLaVA for auto-captioning python -u inference.py \ --task sr \ --upscale 4 \ --version v2.1 \ --captioner llava \ --cfg_scale 8 \ --noise_aug 0 \ --input inputs/demo/bsr \ --output results/v2.1_demo_bsr # DiffBIR v2 (ECCV paper version) - no auto-captioning python -u inference.py \ --task sr \ --upscale 4 \ --version v2 \ --sampler spaced \ --steps 50 \ --captioner none \ --pos_prompt '' \ --neg_prompt 'low quality, blurry, low-resolution, noisy, unsharp, weird textures' \ --cfg_scale 4 \ --input inputs/demo/bsr \ --output results/v2_demo_bsr \ --device cuda --precision fp32 ``` ## Blind Face Restoration (BFR) - Aligned Faces Restore degraded face images that are pre-aligned (cropped and centered on the face) using the specialized face restoration pipeline. ```bash # DiffBIR v2.1 for aligned face restoration python -u inference.py \ --task face \ --upscale 1 \ --version v2.1 \ --captioner llava \ --cfg_scale 8 \ --noise_aug 0 \ --input inputs/demo/bfr/aligned \ --output results/v2.1_demo_bfr_aligned # DiffBIR v2 for aligned face restoration python -u inference.py \ --task face \ --upscale 1 \ --version v2 \ --sampler spaced \ --steps 50 \ --captioner none \ --pos_prompt '' \ --neg_prompt 'low quality, blurry, low-resolution, noisy, unsharp, weird textures' \ --cfg_scale 4.0 \ --input inputs/demo/bfr/aligned \ --output results/v2_demo_bfr_aligned \ --device cuda --precision fp32 ``` ## Blind Face Restoration (BFR) - Unaligned/Whole Image Restore faces within full scene images, automatically detecting and enhancing both faces and background. ```bash # DiffBIR v2.1 for unaligned face + background restoration python -u inference.py \ --task face_background \ --upscale 2 \ --version v2.1 \ --captioner llava \ --cfg_scale 8 \ --noise_aug 0 \ --input inputs/demo/bfr/whole_img \ --output results/v2.1_demo_bfr_unaligned # DiffBIR v2 for unaligned face + background restoration python -u inference.py \ --task face_background \ --upscale 2 \ --version v2 \ --sampler spaced \ --steps 50 \ --captioner none \ --pos_prompt '' \ --neg_prompt 'low quality, blurry, low-resolution, noisy, unsharp, weird textures' \ --cfg_scale 4.0 \ --input inputs/demo/bfr/whole_img \ --output results/v2_demo_bfr_unaligned \ --device cuda --precision fp32 ``` ## Blind Image Denoising (BID) Remove noise from images while preserving and enhancing details using the denoising inference loop. ```bash # DiffBIR v2.1 for image denoising python -u inference.py \ --task denoise \ --upscale 1 \ --version v2.1 \ --captioner llava \ --cfg_scale 8 \ --noise_aug 0 \ --input inputs/demo/bid \ --output results/v2.1_demo_bid # DiffBIR v2 for image denoising python -u inference.py \ --task denoise \ --upscale 1 \ --version v2 \ --sampler spaced \ --steps 50 \ --captioner none \ --pos_prompt '' \ --neg_prompt 'low quality, blurry, low-resolution, noisy, unsharp, weird textures' \ --cfg_scale 4.0 \ --input inputs/demo/bid \ --output results/v2_demo_bid \ --device cuda --precision fp32 ``` ## Tiled Sampling for Low VRAM Process high-resolution images on GPUs with limited memory by enabling tiled inference for all pipeline stages. ```bash # Enable tiled sampling for all stages to reduce VRAM usage python -u inference.py \ --task sr \ --upscale 4 \ --version v2.1 \ --captioner llava \ --cfg_scale 8 \ --input inputs/demo/bsr \ --output results/tiled_output \ --cleaner_tiled \ --cleaner_tile_size 256 \ --cleaner_tile_stride 128 \ --vae_encoder_tiled \ --vae_encoder_tile_size 256 \ --vae_decoder_tiled \ --vae_decoder_tile_size 256 \ --cldm_tiled \ --cldm_tile_size 512 \ --cldm_tile_stride 256 ``` ## Custom Model Inference Run inference with your own trained IRControlNet checkpoint by specifying the training config and model path. ```bash # Inference with custom-trained model python -u inference.py \ --upscale 4 \ --version custom \ --train_cfg path/to/training/config.yaml \ --ckpt path/to/saved/checkpoint.pt \ --captioner llava \ --cfg_scale 8 \ --noise_aug 0 \ --input inputs/demo/bsr \ --output results/custom_demo_bsr ``` ## Gradio Web Interface Launch an interactive web UI for real-time image restoration with adjustable parameters. ```bash # Launch Gradio interface with LLaVA captioner (recommended) python run_gradio.py --captioner llava # For low-VRAM systems, use RAM captioner or disable captioning python run_gradio.py --captioner ram python run_gradio.py --captioner none # Control LLaVA quantization (4-bit default, 8-bit, or 16-bit) python run_gradio.py --captioner llava --llava_bit 8 ``` ## Training Stage 1 (SwinIR Cleaner) Train the stage-1 degradation removal model (SwinIR) on your own dataset. ```bash # 1. Generate file lists for training and validation sets find /path/to/images -type f > files.list shuf files.list > files_shuf.list head -n 10000 files_shuf.list > train.list tail -n +10001 files_shuf.list > val.list # 2. Edit configs/train/train_stage1.yaml with your paths: # - dataset.train.file_list: path/to/train.list # - dataset.val.file_list: path/to/val.list # - train.exp_dir: path/to/experiment/output # 3. Start training with accelerate accelerate launch train_stage1.py --config configs/train/train_stage1.yaml ``` ## Training Stage 2 (IRControlNet) Train the stage-2 ControlNet model that generates realistic details using diffusion. ```bash # 1. Download pretrained Stable Diffusion v2.1 wget https://huggingface.co/stabilityai/stable-diffusion-2-1-base/resolve/main/v2-1_512-ema-pruned.ckpt --no-check-certificate # 2. Generate training file list (same as stage 1) find /path/to/images -type f > train.list # 3. Edit configs/train/train_stage2.yaml with your paths: # - train.sd_path: path/to/v2-1_512-ema-pruned.ckpt # - train.swinir_path: path/to/stage1/checkpoint.pt # - train.exp_dir: path/to/experiment/output # - dataset.train.file_list: path/to/train.list # 4. Start training with accelerate accelerate launch train_stage2.py --config configs/train/train_stage2.yaml ``` ## Python API - Pipeline Usage Use DiffBIR programmatically in Python by importing the pipeline classes and inference loops. ```python import torch import numpy as np from PIL import Image from omegaconf import OmegaConf from diffbir.model import ControlLDM, SwinIR, Diffusion from diffbir.pipeline import SwinIRPipeline from diffbir.utils.common import instantiate_from_config, load_model_from_url from diffbir.inference.pretrained_models import MODELS device = "cuda" # Load stage-1 cleaner model (SwinIR) swinir = instantiate_from_config(OmegaConf.load("configs/inference/swinir.yaml")) swinir.load_state_dict(load_model_from_url(MODELS["swinir_realesrgan"])) swinir.eval().to(device) # Load stage-2 model (ControlLDM) cldm = instantiate_from_config(OmegaConf.load("configs/inference/cldm.yaml")) sd_weight = load_model_from_url(MODELS["sd_v2.1_zsnr"]) cldm.load_pretrained_sd(sd_weight) control_weight = load_model_from_url(MODELS["v2.1"]) cldm.load_controlnet_from_ckpt(control_weight) cldm.eval().to(device) cldm.cast_dtype(torch.float16) # Load diffusion scheduler diffusion = instantiate_from_config(OmegaConf.load("configs/inference/diffusion_v2.1.yaml")) diffusion.to(device) # Create pipeline pipeline = SwinIRPipeline(swinir, cldm, diffusion, None, device) # Load and process image lq = Image.open("input.jpg").convert("RGB") lq = lq.resize((lq.width * 4, lq.height * 4), Image.BICUBIC) # Upscale for SR lq_array = np.array(lq)[None] # Add batch dimension # Run restoration with torch.no_grad(), torch.autocast(device, torch.float16): result = pipeline.run( lq_array, steps=10, strength=1.0, cleaner_tiled=False, cleaner_tile_size=256, cleaner_tile_stride=128, vae_encoder_tiled=False, vae_encoder_tile_size=256, vae_decoder_tiled=False, vae_decoder_tile_size=256, cldm_tiled=True, cldm_tile_size=512, cldm_tile_stride=256, pos_prompt="high quality, detailed", neg_prompt="low quality, blurry", cfg_scale=8.0, start_point_type="noise", sampler_type="edm_dpm++_3m_sde", noise_aug=0, rescale_cfg=False, s_churn=0, s_tmin=0, s_tmax=300, s_noise=1, eta=1, order=1 ) # Save result Image.fromarray(result[0]).save("output.png") ``` ## Sampler Options DiffBIR supports multiple diffusion samplers with different speed/quality trade-offs. ```bash # Available samplers (use with --sampler flag): # Fast samplers (10-15 steps recommended): # - edm_dpm++_3m_sde (default, high quality) # - edm_dpm++_2m_sde # - edm_dpm++_2m # - dpm++_m2 # Standard samplers (50 steps recommended): # - spaced # - ddim # EDM samplers with stochastic options: # - edm_euler, edm_euler_a # - edm_heun # - edm_dpm_2, edm_dpm_2_a # - edm_lms # - edm_dpm++_2s_a # - edm_dpm++_sde # Example: Fast high-quality sampling python -u inference.py \ --task sr --version v2.1 \ --sampler edm_dpm++_3m_sde \ --steps 10 \ --input inputs/demo/bsr --output results/fast # Example: Traditional DDIM sampling python -u inference.py \ --task sr --version v2 \ --sampler ddim \ --steps 50 \ --start_point_type cond \ --input inputs/demo/bsr --output results/ddim ``` ## Pretrained Model Versions DiffBIR offers three model versions optimized for different use cases. | Version | Stage-2 Model | Training Data | Best For | |---------|--------------|---------------|----------| | v2.1 | DiffBIR_v2.1.pt | Unsplash + LLaVA captions | General use, best quality | | v2 | v2.pth | Laion2b-en filtered | ECCV paper reproduction | | v1 | v1_general.pth / v1_face.pth | ImageNet-1k / FFHQ | Task-specific models | ```bash # Download models manually (auto-downloaded during inference) # v2.1 (recommended) wget https://huggingface.co/lxq007/DiffBIR-v2/resolve/main/DiffBIR_v2.1.pt # v2 wget https://huggingface.co/lxq007/DiffBIR-v2/resolve/main/v2.pth # v1 models wget https://huggingface.co/lxq007/DiffBIR-v2/resolve/main/v1_general.pth wget https://huggingface.co/lxq007/DiffBIR-v2/resolve/main/v1_face.pth ``` ## Summary DiffBIR provides a unified framework for blind image restoration tasks including super-resolution, face restoration, and denoising. The two-stage pipeline combines traditional degradation removal with diffusion-based detail generation, achieving state-of-the-art results on real-world degraded images. Key features include automatic image captioning via LLaVA for context-aware restoration, tiled sampling for processing high-resolution images on limited VRAM, and multiple sampler options for balancing speed and quality. For typical usage, start with DiffBIR v2.1 using the `edm_dpm++_3m_sde` sampler at 10 steps with LLaVA captioning enabled for best results. Enable tiled sampling when processing large images or when running on GPUs with less than 12GB VRAM. The framework supports both command-line batch processing and interactive web-based inference via Gradio, making it suitable for both research workflows and practical applications.