### Install Conda and Run Setup Script Source: https://github.com/rvc-boss/gpt-sovits/blob/main/Colab-Inference.ipynb Installs condacolab and Anaconda, then executes the setup script to prepare the environment. ```python %pip install -q condacolab import condacolab condacolab.install_from_url("https://repo.anaconda.com/archive/Anaconda3-2024.10-1-Linux-x86_64.sh") !cd /content && bash setup.sh ``` -------------------------------- ### Starting the API Server Source: https://context7.com/rvc-boss/gpt-sovits/llms.txt Starts the primary API server with specified pretrained models and reference audio. ```APIDOC ## Starting the API Server (api.py) The primary API server provides REST endpoints for text-to-speech synthesis with support for streaming audio output and model switching. ```bash python api.py \ -s "GPT_SoVITS/pretrained_models/gsv-v2final-pretrained/s2G2333k.pth" \ -g "GPT_SoVITS/pretrained_models/gsv-v2final-pretrained/s1bert25hz-5kh-longer-epoch=12-step=369668.ckpt" \ -dr "reference_audio.wav" \ -dt "Hello, this is a reference text." \ -dl "en" \ -a "0.0.0.0" \ -p 9880 \ -sm "normal" \ -mt "wav" ``` ``` -------------------------------- ### Install GPT-SoVITS on Linux Source: https://github.com/rvc-boss/gpt-sovits/blob/main/README.md Install GPT-SoVITS on Linux by running the install script. Specify the device (GPU or CPU) and the model source (Hugging Face or ModelScope). Optionally download UVR5 models. ```bash bash install.sh --device --source [--download-uvr5] ``` -------------------------------- ### Install Conda and Execute Setup Script Source: https://github.com/rvc-boss/gpt-sovits/blob/main/Colab-WebUI.ipynb This Python script installs Conda using condacolab and then executes the previously defined setup.sh script to configure the environment for GPT-SoVITS. Ensure this runs after the initial environment setup. ```python %pip install -q condacolab import condacolab condacolab.install_from_url("https://repo.anaconda.com/archive/Anaconda3-2024.10-1-Linux-x86_64.sh") !cd /content && bash setup.sh ``` -------------------------------- ### Install GPT-SoVITS on macOS Source: https://github.com/rvc-boss/gpt-sovits/blob/main/README.md Install GPT-SoVITS on macOS using the install script. Note that GPU training on Macs yields lower quality, so CPU is recommended. Specify the device (MPS or CPU) and model source. ```bash bash install.sh --device --source [--download-uvr5] ``` -------------------------------- ### Start WebUI for Training and Inference Source: https://context7.com/rvc-boss/gpt-sovits/llms.txt Launches the integrated WebUI for managing the GPT-SoVITS workflow, including training and inference. Can be started with specific language or model versions. ```bash # Start WebUI (auto-detects latest model version) python webui.py # Start with specific language python webui.py en # Start with V1 models python webui.py v1 ``` -------------------------------- ### Starting the V2 API Server Source: https://context7.com/rvc-boss/gpt-sovits/llms.txt Starts the V2 API server with enhanced features including parallel inference and streaming modes. ```APIDOC ## Starting the V2 API Server (api_v2.py) The V2 API provides enhanced features including parallel inference, streaming modes, and batch processing with YAML configuration. ```bash python api_v2.py \ -c "GPT_SoVITS/configs/tts_infer.yaml" \ -a "127.0.0.1" \ -p 9880 ``` ``` -------------------------------- ### Setup Environment Script for GPT-SoVITS Source: https://github.com/rvc-boss/gpt-sovits/blob/main/Colab-WebUI.ipynb This shell script clones the GPT-SoVITS repository, creates and activates a Conda environment named 'GPTSoVITS', installs ipykernel, and runs the installation script. It's designed to be run once for environment setup. ```shell set -e cd /content git clone https://github.com/RVC-Boss/GPT-SoVITS.git cd GPT-SoVITS if conda env list | awk '{print $1}' | grep -Fxq "GPTSoVITS"; then : else conda create -n GPTSoVITS python=3.10 -y fi source activate GPTSoVITS pip install ipykernel bash install.sh --device CU126 --source HF --download-uvr5 ``` -------------------------------- ### Install GPT-SoVITS WebUI on Windows Source: https://github.com/rvc-boss/gpt-sovits/blob/main/README.md Use this command to create a conda environment and install GPT-SoVITS with specified device and source options. The --DownloadUVR5 flag is optional. ```powershell conda create -n GPTSoVits python=3.10 conda activate GPTSoVits pwsh -F install.ps1 --Device --Source [--DownloadUVR5] ``` -------------------------------- ### Start GPT-SoVITS API Server Source: https://context7.com/rvc-boss/gpt-sovits/llms.txt Launches the primary API server with specified pretrained models and reference audio. Ensure all paths to model checkpoints and reference audio are correct. ```bash python api.py \ -s "GPT_SoVITS/pretrained_models/gsv-v2final-pretrained/s2G2333k.pth" \ -g "GPT_SoVITS/pretrained_models/gsv-v2final-pretrained/s1bert25hz-5kh-longer-epoch=12-step=369668.ckpt" \ -dr "reference_audio.wav" \ -dt "Hello, this is a reference text." \ -dl "en" \ -a "0.0.0.0" \ -p 9880 \ -sm "normal" \ -mt "wav" ``` -------------------------------- ### Setup Shell Script for GPT-SoVITS Source: https://github.com/rvc-boss/gpt-sovits/blob/main/Colab-Inference.ipynb This script clones the GPT-SoVITS repository, creates a Conda environment named GPTSoVITS, and installs necessary Python packages. ```shell set -e cd /content git clone https://github.com/RVC-Boss/GPT-SoVITS.git cd GPT-SoVITS mkdir -p GPT_weights mkdir -p SoVITS_weights if conda env list | awk '{print $1}' | grep -Fxq "GPTSoVITS"; then : else conda create -n GPTSoVITS python=3.10 -y fi source activate GPTSoVITS pip install ipykernel bash install.sh --device CU126 --source HF ``` -------------------------------- ### Install FFmpeg on Ubuntu/Debian Source: https://github.com/rvc-boss/gpt-sovits/blob/main/README.md Install FFmpeg and libsox-dev on Ubuntu or Debian systems using apt package manager. ```bash sudo apt install ffmpeg sudo apt install libsox-dev ``` -------------------------------- ### Create Conda Environment for BigVGAN Source: https://github.com/rvc-boss/gpt-sovits/blob/main/GPT_SoVITS/BigVGAN/README.md Example command to create a conda environment with specified Python and PyTorch versions, including CUDA support. Ensure you have conda installed. ```shell conda create -n bigvgan python=3.10 pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia conda activate bigvgan ``` -------------------------------- ### Start Inference-Only WebUI Source: https://context7.com/rvc-boss/gpt-sovits/llms.txt Launches a WebUI specifically for inference tasks, separate from the training functionalities. ```bash python GPT_SoVITS/inference_webui.py ``` -------------------------------- ### Train 16kHz Model with Checkpoint Path Source: https://github.com/rvc-boss/gpt-sovits/blob/main/tools/AP_BWE_main/README.md Example command to train the 16kHz model, specifying both the configuration file and the checkpoint path. This allows for resuming training or organizing checkpoints. ```bash CUDA_VISIBLE_DEVICES=0 python train_16k.py --config ../configs/config_2kto16k.json --checkpoint_path ../checkpoints/AP-BWE_2kto16k ``` -------------------------------- ### Manually Install Dependencies for GPT-SoVITS Source: https://github.com/rvc-boss/gpt-sovits/blob/main/docs/cn/README.md These commands set up a conda environment and install necessary Python dependencies from requirements files for GPT-SoVITS. It's recommended to use this method if the automated scripts do not work. ```bash conda create -n GPTSoVits python=3.10 conda activate GPTSoVits pip install -r extra-req.txt --no-deps pip install -r requirements.txt ``` -------------------------------- ### Run Local Gradio Demo Source: https://github.com/rvc-boss/gpt-sovits/blob/main/GPT_SoVITS/BigVGAN/README.md Commands to install dependencies for the local Gradio demo and then run the demo application. Ensure you are in the BigVGAN directory. ```shell pip install -r demo/requirements.txt python demo/app.py ``` -------------------------------- ### Install FFmpeg on macOS Source: https://github.com/rvc-boss/gpt-sovits/blob/main/README.md Install FFmpeg on macOS using the Homebrew package manager. ```bash brew install ffmpeg ``` -------------------------------- ### Install FFmpeg using Conda Source: https://github.com/rvc-boss/gpt-sovits/blob/main/README.md For Conda users, install FFmpeg by activating the GPTSoVits environment and running the conda install command. ```bash conda activate GPTSoVits conda install ffmpeg ``` -------------------------------- ### Install GPT-SoVITS WebUI on Linux Source: https://github.com/rvc-boss/gpt-sovits/blob/main/docs/cn/README.md Use this bash command to create a conda environment and install dependencies for GPT-SoVITS on Linux. Specify the device (CUDA, ROCm, or CPU) and the package source. ```bash conda create -n GPTSoVits python=3.10 conda activate GPTSoVits bash install.sh --device --source [--download-uvr5] ``` -------------------------------- ### Install GPT-SoVITS WebUI on macOS Source: https://github.com/rvc-boss/gpt-sovits/blob/main/docs/cn/README.md Use this bash command to create a conda environment and install dependencies for GPT-SoVITS on macOS. Note that GPU training on Mac may yield results significantly lower than other devices, so CPU training is recommended. Specify the device (MPS or CPU) and the package source. ```bash conda create -n GPTSoVits python=3.10 conda activate GPTSoVits bash install.sh --device --source [--download-uvr5] ``` -------------------------------- ### Start V2 API Server Source: https://context7.com/rvc-boss/gpt-sovits/llms.txt Initiates the V2 API server using a YAML configuration file for TTS inference. This version supports enhanced features like parallel inference and streaming. ```bash # Start V2 API server with configuration file python api_v2.py \ -c "GPT_SoVITS/configs/tts_infer.yaml" \ -a "127.0.0.1" \ -p 9880 ``` -------------------------------- ### Inference Quickstart with Hugging Face Hub Source: https://github.com/rvc-boss/gpt-sovits/blob/main/GPT_SoVITS/BigVGAN/README.md Python code to perform audio inference using a pretrained BigVGAN model from Hugging Face Hub. Loads a model, computes mel spectrogram, and generates a waveform. Set `use_cuda_kernel=True` for potentially faster inference. ```python device = 'cuda' import torch import bigvgan import librosa from meldataset import get_mel_spectrogram # instantiate the model. You can optionally set use_cuda_kernel=True for faster inference. model = bigvgan.BigVGAN.from_pretrained('nvidia/bigvgan_v2_24khz_100band_256x', use_cuda_kernel=False) # remove weight norm in the model and set to eval mode model.remove_weight_norm() model = model.eval().to(device) # load wav file and compute mel spectrogram wav_path = '/path/to/your/audio.wav' wav, sr = librosa.load(wav_path, sr=model.h.sampling_rate, mono=True) # wav is np.ndarray with shape [T_time] and values in [-1, 1] wav = torch.FloatTensor(wav).unsqueeze(0) # wav is FloatTensor with shape [B(1), T_time] # compute mel spectrogram from the ground truth audio mel = get_mel_spectrogram(wav, model.h).to(device) # mel is FloatTensor with shape [B(1), C_mel, T_frame] # generate waveform from mel with torch.inference_mode(): wav_gen = model(mel) # wav_gen is FloatTensor with shape [B(1), 1, T_time] and values in [-1, 1] wav_gen_float = wav_gen.squeeze(0).cpu() # wav_gen is FloatTensor with shape [1, T_time] # you can convert the generated waveform to 16 bit linear PCM wav_gen_int16 = (wav_gen_float * 32767.0).numpy().astype('int16') # wav_gen is now np.ndarray with shape [1, T_time] and int16 dtype ``` -------------------------------- ### Dataset Annotation Example Source: https://github.com/rvc-boss/gpt-sovits/blob/main/docs/cn/README.md An example of a .list file entry, showing the expected format for a Chinese TTS annotation. ```plaintext D:\GPT-SoVITS\xxx/xxx.wav|xxx|zh|我爱玩原神. ``` -------------------------------- ### Inference for 16kHz with Output Directory Source: https://github.com/rvc-boss/gpt-sovits/blob/main/tools/AP_BWE_main/README.md Example command for 16kHz inference, specifying the generator checkpoint and a custom output directory for the generated audio files. ```python python inference_16k.py --checkpoint_file ../checkpoints/2kto16k/g_2kto16k --output_dir ../generated_files/2kto16k ``` -------------------------------- ### Install Requirements for v2 Source: https://github.com/rvc-boss/gpt-sovits/blob/main/README.md Update necessary packages when upgrading to v2. Ensure you have the latest codes cloned from GitHub. ```bash pip install -r requirements.txt ``` -------------------------------- ### Install BigVGAN Dependencies Source: https://github.com/rvc-boss/gpt-sovits/blob/main/GPT_SoVITS/BigVGAN/README.md Commands to clone the BigVGAN repository and install its Python dependencies using pip. This should be run after activating the conda environment. ```shell git clone https://github.com/NVIDIA/BigVGAN cd BigVGAN pip install -r requirements.txt ``` -------------------------------- ### Dataset Format Example Source: https://context7.com/rvc-boss/gpt-sovits/llms.txt Defines the required format for training data, which is a pipe-separated list of audio file path, speaker name, language, and transcribed text. Supports multiple languages. ```text # Format: audio_path|speaker_name|language|text # Languages: zh (Chinese), en (English), ja (Japanese), ko (Korean), yue (Cantonese) # Example training list file (training.list): /data/audio/speaker1_001.wav|speaker1|en|Hello, this is a sample sentence. /data/audio/speaker1_002.wav|speaker1|en|Another example of training data. /data/audio/speaker1_003.wav|speaker1|zh|这是一个中文样本。 ``` -------------------------------- ### Start UVR5 Vocal Separation WebUI Source: https://context7.com/rvc-boss/gpt-sovits/llms.txt Launches the WebUI for UVR5, a tool for separating vocals from accompaniment in audio files. Requires specifying the device, precision mode, and port. ```bash # Start UVR5 WebUI python tools/uvr5/webui.py "cuda" True 9873 # Parameters: # - Device: "cuda" or "cpu" # - is_half: True for half precision (faster on GPU) # - port: WebUI port number ``` -------------------------------- ### Launch GPT-SoVITS Web UI Source: https://github.com/rvc-boss/gpt-sovits/blob/main/Colab-Inference.ipynb Starts the GPT-SoVITS web interface using the webui.py script within the activated Conda environment. Set is_share to True to create a public gradio link. ```shell !cd /content/GPT-SoVITS && source activate GPTSoVITS && export is_share=True && python webui.py ``` -------------------------------- ### Basic TTS Inference via GET Request Source: https://context7.com/rvc-boss/gpt-sovits/llms.txt Synthesizes speech from text using the default reference audio configured at server startup. ```APIDOC ## Basic TTS Inference via GET Request Synthesize speech from text using the default reference audio configured at server startup. ### Method GET ### Endpoint `http://127.0.0.1:9880/` ### Query Parameters - **text** (string) - Required - The text to synthesize. - **text_language** (string) - Required - The language of the text. - **cut_punc** (string) - Optional - Punctuation character to split sentences. ### Request Example ```bash # Simple TTS request using default reference audio curl "http://127.0.0.1:9880?text=Hello%20world%2C%20this%20is%20a%20test.&text_language=en" # With custom punctuation splitting curl "http://127.0.0.1:9880?text=First%20sentence.%20Second%20sentence.&text_language=en&cut_punc=." ``` ### Response #### Success Response (200) - **audio** (binary) - The synthesized audio stream. #### Response Example (Binary audio data) ``` -------------------------------- ### Instantiate BigVGAN with CUDA Kernel Source: https://github.com/rvc-boss/gpt-sovits/blob/main/GPT_SoVITS/BigVGAN/README.md Enable the fast CUDA inference kernel when initializing the BigVGAN model. This requires CUDA to be installed and compatible with your PyTorch build. ```python generator = BigVGAN(h, use_cuda_kernel=True) ``` -------------------------------- ### Successful CUDA Kernel Test Output Source: https://github.com/rvc-boss/gpt-sovits/blob/main/GPT_SoVITS/BigVGAN/README.md Example output indicating a successful test of the CUDA fused kernel against the plain PyTorch BigVGAN inference. A low mean difference confirms correctness. ```shell loading plain Pytorch BigVGAN ... loading CUDA kernel BigVGAN with auto-build Detected CUDA files, patching ldflags Emitting ninja build file /path/to/your/BigVGAN/alias_free_activation/cuda/build/build.ninja.. Building extension module anti_alias_activation_cuda... ... Loading extension module anti_alias_activation_cuda... ... Loading '/path/to/your/bigvgan_generator.pt' ... [Success] test CUDA fused vs. plain torch BigVGAN inference > mean_difference=0.0007238413265440613 ... ``` -------------------------------- ### Basic TTS Inference via GET Request Source: https://context7.com/rvc-boss/gpt-sovits/llms.txt Performs a simple text-to-speech synthesis request using the default reference audio. Supports custom punctuation splitting for more control over sentence segmentation. ```bash # Simple TTS request using default reference audio curl "http://127.0.0.1:9880?text=Hello%20world%2C%20this%20is%20a%20test.&text_language=en" ``` ```bash # With custom punctuation splitting curl "http://127.0.0.1:9880?text=First%20sentence.%20Second%20sentence.&text_language=en&cut_punc=." ``` -------------------------------- ### Run WebUI for GPT-SoVITS Source: https://github.com/rvc-boss/gpt-sovits/blob/main/README.md Launches the GPT-SoVITS web UI. An optional language parameter can be specified. ```bash python webui.py ``` ```bash python webui.py v1 ``` -------------------------------- ### Install Dependencies Manually Source: https://github.com/rvc-boss/gpt-sovits/blob/main/README.md Manually install project dependencies using pip. Ensure you have activated the GPTSoVits conda environment first. This installs packages from requirements.txt and optionally extra-req.txt without their dependencies. ```bash pip install -r extra-req.txt --no-deps pip install -r requirements.txt ``` -------------------------------- ### Run Inference WebUI for GPT-SoVITS Source: https://github.com/rvc-boss/gpt-sovits/blob/main/README.md Launches the GPT-SoVITS inference web UI. An optional language parameter can be specified. ```bash python GPT_SoVITS/inference_webui.py ``` ```bash python webui.py ``` -------------------------------- ### Run UVR5 WebUI from Command Line Source: https://github.com/rvc-boss/gpt-sovits/blob/main/README.md Use this command to open the WebUI for UVR5. Replace placeholders with your specific device, precision, and port settings. ```bash python tools/uvr5/webui.py "" ``` -------------------------------- ### Change Default Reference Audio (GET) Source: https://context7.com/rvc-boss/gpt-sovits/llms.txt Alternatively, change the default reference audio using a GET request with query parameters. This method is less common for state changes. ```python # GET method requests.get( "http://127.0.0.1:9880/change_refer", params={ "refer_wav_path": "new_reference.wav", "prompt_text": "New reference text.", "prompt_language": "en" } ) ``` -------------------------------- ### Build Docker Image Locally Source: https://context7.com/rvc-boss/gpt-sovits/llms.txt Build the GPT-SoVITS Docker image locally. Specify the CUDA version and whether to use the lite version. ```bash bash docker_build.sh --cuda 12.6 --lite ``` -------------------------------- ### Launch GPT-SoVITS WebUI Source: https://github.com/rvc-boss/gpt-sovits/blob/main/Colab-WebUI.ipynb This command launches the GPT-SoVITS WebUI. It navigates to the project directory, activates the Conda environment, sets the sharing option to true, and runs the webui.py script. This should be run after the environment is fully set up. ```shell !cd /content/GPT-SoVITS && source activate GPTSoVITS && export is_share=True && python webui.py ``` -------------------------------- ### Shutdown API Server Source: https://context7.com/rvc-boss/gpt-sovits/llms.txt Shut down the GPT-SoVITS API server using a GET request to the control endpoint. Use with caution. ```bash # Shutdown the server curl "http://127.0.0.1:9880/control?command=exit" ``` -------------------------------- ### Build Docker Image Locally Source: https://github.com/rvc-boss/gpt-sovits/blob/main/README.md Build the Docker image locally using the provided script. Specify the CUDA version and optionally the '--lite' flag for a lightweight image. ```bash bash docker_build.sh --cuda <12.6|12.8> [--lite] ``` -------------------------------- ### Restart API Server Source: https://context7.com/rvc-boss/gpt-sovits/llms.txt Restart the GPT-SoVITS API server using a simple GET request to the control endpoint. Ensure the server is accessible. ```bash # Restart the server curl "http://127.0.0.1:9880/control?command=restart" ``` -------------------------------- ### Initialize TTS and Run Inference Source: https://context7.com/rvc-boss/gpt-sovits/llms.txt Initializes the Text-to-Speech (TTS) system with specified configurations and runs inference to synthesize speech. Ensure reference audio and input parameters are correctly set. ```python config = TTS_Config({ "device": "cuda", "is_half": True, "version": "v2", "t2s_weights_path": "GPT_SoVITS/pretrained_models/gsv-v2final-pretrained/s1bert25hz-5kh-longer-epoch=12-step=369668.ckpt", "vits_weights_path": "GPT_SoVITS/pretrained_models/gsv-v2final-pretrained/s2G2333k.pth", "bert_base_path": "GPT_SoVITS/pretrained_models/chinese-roberta-wwm-ext-large", "cnhuhbert_base_path": "GPT_SoVITS/pretrained_models/chinese-hubert-base" }) tts = TTS(config) tts.set_ref_audio("reference.wav") inputs = { "text": "Text to synthesize into speech.", "text_lang": "en", "ref_audio_path": "reference.wav", "prompt_text": "Reference audio text.", "prompt_lang": "en", "top_k": 15, "top_p": 1.0, "temperature": 1.0, "speed_factor": 1.0, "parallel_infer": True } for sr, audio in tts.run(inputs): import soundfile as sf sf.write("output.wav", audio, sr) break ``` -------------------------------- ### Change Default Reference Audio Source: https://context7.com/rvc-boss/gpt-sovits/llms.txt Update the default reference audio used when no reference is provided in requests. This can be done via POST or GET requests. ```APIDOC ## POST /change_refer ### Description Updates the default reference audio for voice synthesis. ### Method POST ### Endpoint `http://127.0.0.1:9880/change_refer` ### Parameters #### Request Body - **refer_wav_path** (string) - Required - Path to the new reference audio file. - **prompt_text** (string) - Required - Transcription of the reference audio. - **prompt_language** (string) - Required - Language of the reference audio. ### Request Example ```json { "refer_wav_path": "new_reference.wav", "prompt_text": "New reference audio transcription.", "prompt_language": "en" } ``` ## GET /change_refer ### Description Updates the default reference audio for voice synthesis using query parameters. ### Method GET ### Endpoint `http://127.0.0.1:9880/change_refer` ### Parameters #### Query Parameters - **refer_wav_path** (string) - Required - Path to the new reference audio file. - **prompt_text** (string) - Required - Transcription of the reference audio. - **prompt_language** (string) - Required - Language of the reference audio. ### Request Example ``` http://127.0.0.1:9880/change_refer?refer_wav_path=new_reference.wav&prompt_text=New%20reference%20text.&prompt_language=en ``` ``` -------------------------------- ### Create and Activate Conda Environment Source: https://github.com/rvc-boss/gpt-sovits/blob/main/README.md Use this command to create a new conda environment named GPTSoVits with Python 3.10 and then activate it. This is a prerequisite for most installation steps. ```bash conda create -n GPTSoVits python=3.10 conda activate GPTSoVits ``` -------------------------------- ### Run Docker Compose Lite Source: https://context7.com/rvc-boss/gpt-sovits/llms.txt Use this command to run the GPT-SoVITS service with a smaller Docker image, suitable for resource-constrained environments. ```bash docker compose run --service-ports GPT-SoVITS-CU126-Lite ``` -------------------------------- ### Create Symbolic Links for LibriTTS Dataset Source: https://github.com/rvc-boss/gpt-sovits/blob/main/GPT_SoVITS/BigVGAN/README.md Prepare the LibriTTS dataset by creating symbolic links to the root directory. This is necessary for the codebase to correctly reference training and validation files. ```shell cd filelists/LibriTTS && \ ln -s /path/to/your/LibriTTS/train-clean-100 train-clean-100 && \ ln -s /path/to/your/LibriTTS/train-clean-360 train-clean-360 && \ ln -s /path/to/your/LibriTTS/train-other-500 train-other-500 && \ ln -s /path/to/your/LibriTTS/dev-clean dev-clean && \ ln -s /path/to/your/LibriTTS/dev-other dev-other && \ ln -s /path/to/your/LibriTTS/test-clean test-clean && \ ln -s /path/to/your/LibriTTS/test-other test-other && \ cd ../.. ``` -------------------------------- ### Download Models from Hugging Face Source: https://github.com/rvc-boss/gpt-sovits/blob/main/Colab-Inference.ipynb Downloads GPT and SoVITS model checkpoints from Hugging Face based on user-defined repository and file paths. ```python # Modify These USER_ID = "AkitoP" REPO_NAME = "GPT-SoVITS-v2-aegi" BRANCH = "main" GPT_PATH = "new_aegigoe-e100.ckpt" SOVITS_PATH = "new_aegigoe_e60_s32220.pth" # Do Not Modify HF_BASE = "https://huggingface.co" REPO_ID = f"{USER_ID}/{REPO_NAME}" GPT_URL = f"{HF_BASE}/{REPO_ID}/blob/{BRANCH}/{GPT_PATH}" SOVITS_URL = f"{HF_BASE}/{REPO_ID}/blob/{BRANCH}/{SOVITS_PATH}" !cd "/content/GPT-SoVITS/GPT_weights" && wget "{GPT_URL}" !cd "/content/GPT-SoVITS/SoVITS_weights" && wget "{SOVITS_URL}" ``` -------------------------------- ### Train BigVGAN Model with LibriTTS Dataset Source: https://github.com/rvc-boss/gpt-sovits/blob/main/GPT_SoVITS/BigVGAN/README.md Initiate the training process for a BigVGAN-v2 model using the LibriTTS dataset. This command specifies the configuration file, dataset paths, and checkpoint location. ```shell python train.py \ --config configs/bigvgan_v2_24khz_100band_256x.json \ --input_wavs_dir filelists/LibriTTS \ --input_training_file filelists/LibriTTS/train-full.txt \ --input_validation_file filelists/LibriTTS/val-full.txt \ --list_input_unseen_wavs_dir filelists/LibriTTS filelists/LibriTTS \ --list_input_unseen_validation_file filelists/LibriTTS/dev-clean.txt filelists/LibriTTS/dev-other.txt \ --checkpoint_path exp/bigvgan_v2_24khz_100band_256x ``` -------------------------------- ### Enable CUDA Kernel via Command Line Source: https://github.com/rvc-boss/gpt-sovits/blob/main/GPT_SoVITS/BigVGAN/README.md Activate the custom CUDA inference kernel for synthesis scripts by passing the `--use_cuda_kernel` flag. The kernel is built automatically on first use. ```shell python inference.py --use_cuda_kernel ... ``` ```shell python inference_e2e.py --use_cuda_kernel ... ``` -------------------------------- ### Docker Deployment with Docker Compose Source: https://context7.com/rvc-boss/gpt-sovits/llms.txt Deploys GPT-SoVITS using Docker Compose, specifying the CUDA version. This command runs the service with exposed ports for accessibility. ```bash # Using Docker Compose (CUDA 12.6) docker compose run --service-ports GPT-SoVITS-CU126 ``` -------------------------------- ### V2 API TTS Synthesis with Streaming Source: https://context7.com/rvc-boss/gpt-sovits/llms.txt Performs text-to-speech synthesis using the V2 API with streaming enabled for continuous audio output. This example demonstrates advanced parameters for batch processing, splitting strategies, and streaming modes. Requires the 'requests' library. ```python import requests # V2 API TTS request with streaming response = requests.post( "http://127.0.0.1:9880/tts", json={ "text": "This is a long text that will be synthesized with streaming support.", "text_lang": "en", "ref_audio_path": "reference.wav", "prompt_text": "Reference audio transcription.", "prompt_lang": "en", "top_k": 15, "top_p": 1.0, "temperature": 1.0, "text_split_method": "cut5", # Splitting strategy "batch_size": 1, "batch_threshold": 0.75, "split_bucket": True, "speed_factor": 1.0, "fragment_interval": 0.3, "seed": 42, "parallel_infer": True, "repetition_penalty": 1.35, "sample_steps": 32, "super_sampling": False, "streaming_mode": 1, # 0: disabled, 1: best quality, 2: medium, 3: fast "media_type": "wav" }, stream=True ) # Save streaming audio with open("output.wav", "wb") as f: for chunk in response.iter_content(chunk_size=1024): f.write(chunk) ``` -------------------------------- ### Initialize Python TTS Class Source: https://context7.com/rvc-boss/gpt-sovits/llms.txt Demonstrates the initial import statements for using the GPT-SoVITS TTS functionality directly within a Python script. Further usage would involve instantiating TTS and TTS_Config classes. ```python from GPT_SoVITS.TTS_infer_pack.TTS import TTS, TTS_Config ``` -------------------------------- ### Download Models from ModelScope Source: https://github.com/rvc-boss/gpt-sovits/blob/main/Colab-Inference.ipynb Downloads GPT and SoVITS model checkpoints from ModelScope using specified user, repository, branch, and file paths. ```python # Modify These USER_ID = "aihobbyist" REPO_NAME = "GPT-SoVits-V2-models" BRANCH = "master" GPT_PATH = "Genshin_Impact/EN/GPT_GenshinImpact_EN_5.1.ckpt" SOVITS_PATH = "Wuthering_Waves/CN/SV_WutheringWaves_CN_1.3.pth" # Do Not Modify HF_BASE = "https://www.modelscope.cn/models" REPO_ID = f"{USER_ID}/{REPO_NAME}" GPT_URL = f"{HF_BASE}/{REPO_ID}/resolve/{BRANCH}/{GPT_PATH}" SOVITS_URL = f"{HF_BASE}/{REPO_ID}/resolve/{BRANCH}/{SOVITS_PATH}" !cd "/content/GPT-SoVITS/GPT_weights" && wget "{GPT_URL}" !cd "/content/GPT-SoVITS/SoVITS_weights" && wget "{SOVITS_URL}" ``` -------------------------------- ### Command Line Interface for Batch TTS Synthesis Source: https://context7.com/rvc-boss/gpt-sovits/llms.txt Performs batch TTS synthesis using the command-line interface. Requires specifying paths for GPT and SoVITS models, reference audio, and target text files. ```bash python GPT_SoVITS/inference_cli.py \ --gpt_model "GPT_SoVITS/pretrained_models/gsv-v2final-pretrained/s1bert25hz-5kh-longer-epoch=12-step=369668.ckpt" \ --sovits_model "GPT_SoVITS/pretrained_models/gsv-v2final-pretrained/s2G2333k.pth" \ --ref_audio "reference_audio.wav" \ --ref_text "reference_text.txt" \ --ref_language "英文" \ --target_text "target_text.txt" \ --target_language "英文" \ --output_path "output/" ``` -------------------------------- ### Train 48kHz Model Source: https://github.com/rvc-boss/gpt-sovits/blob/main/tools/AP_BWE_main/README.md Initiates the training process for the 48kHz speech bandwidth extension model. Requires a configuration file path. Checkpoints are saved in 'cp_model' by default. ```bash cd train CUDA_VISIBLE_DEVICES=0 python train_48k.py --config [config file path] ``` -------------------------------- ### Run a Specific Docker Compose Service Source: https://github.com/rvc-boss/gpt-sovits/blob/main/README.md Execute a specific GPT-SoVITS service using Docker Compose. Choose between full or lite versions, and different CUDA versions. Ensure you are in the project root directory. ```bash docker compose run --service-ports ``` -------------------------------- ### Train 16kHz Model Source: https://github.com/rvc-boss/gpt-sovits/blob/main/tools/AP_BWE_main/README.md Initiates the training process for the 16kHz speech bandwidth extension model. Requires a configuration file path. Checkpoints are saved in 'cp_model' by default. ```bash cd train CUDA_VISIBLE_DEVICES=0 python train_16k.py --config [config file path] ``` -------------------------------- ### Switch SoVITS Model via API Source: https://context7.com/rvc-boss/gpt-sovits/llms.txt Dynamically changes the SoVITS model weights at runtime using a cURL command. Ensure the server is running and the provided path is correct. ```bash # Switch SoVITS model curl "http://127.0.0.1:9880/set_sovits_weights?weights_path=GPT_SoVITS/pretrained_models/s2Gv3.pth" ```