### Run Default Example (CLI) Source: https://github.com/holasyb918/heygem-linux-python-hack/blob/main/_autodocs/09_quick_reference.md Execute the main script with default example files. ```bash # Default (example files) python run.py ``` -------------------------------- ### Generate Video from Audio and Video (CLI) Source: https://github.com/holasyb918/heygem-linux-python-hack/blob/main/_autodocs/11_integration_examples.md Use this command-line example to generate a video from speech audio and a source video. Ensure you have the necessary environment setup by running the check script first. ```bash cd /workspace/heygem-linux-python-hack # Verify setup python check_env/check_onnx_cuda.py # Generate video python run.py --audio_path data/speech.wav --video_path data/face.mp4 # Output: result/{task_id}/{task_id}-r.mp4 ``` -------------------------------- ### Install and Verify Project Dependencies Source: https://github.com/holasyb918/heygem-linux-python-hack/blob/main/_autodocs/08_error_handling_and_troubleshooting.md Installs required Python packages and prints their versions to verify installation. Use this to check if all necessary libraries are correctly set up. ```bash python -c "import cv2; print('OpenCV:', cv2.__version__)" ``` ```bash python -c "import torch; print('PyTorch:', torch.__version__)" ``` ```bash python -c "import onnxruntime; print('ONNX Runtime:', onnxruntime.__version__)" ``` ```bash python -c "import gradio; print('Gradio:', gradio.__version__)" ``` -------------------------------- ### Create Dataset Example Source: https://github.com/holasyb918/heygem-linux-python-hack/blob/main/_autodocs/03_face_processing_pipeline.md Provides an example of using the create_dataset function to instantiate a dataloader with specific options. It shows how to iterate through the loaded batches. ```python from landmark2face_wy.data import create_dataset opt = type('Options', (), { 'dataset_mode': 'l2faceaudio', 'img_size': 256, 'batch_size': 8, 'num_threads': 4, 'serial_batches': False, 'max_dataset_size': 50000, 'distributed': False, })() dataloader = create_dataset(opt, mode='train') for batch in dataloader: images = batch['B'] # (8, 3, 256, 256) audio = batch['A_label'] # (8, 1, 256, 256) ``` -------------------------------- ### Install FFmpeg Source: https://github.com/holasyb918/heygem-linux-python-hack/blob/main/_autodocs/09_quick_reference.md Install FFmpeg for handling audio and video processing. This is a system-level dependency. ```bash apt-get install ffmpeg ``` -------------------------------- ### Start Gradio Web Server (CLI) Source: https://github.com/holasyb918/heygem-linux-python-hack/blob/main/_autodocs/09_quick_reference.md Start the Gradio web interface server. Access the UI at http://localhost:7860 to upload files. ```bash # Start server python app.py # Access at http://localhost:7860 # Upload audio and video files via UI ``` -------------------------------- ### Command-Line Training Example Source: https://github.com/holasyb918/heygem-linux-python-hack/blob/main/_autodocs/06_neural_network_models.md Example command for training the L2Face model on a specified dataset with various configuration options, including image size, batch size, learning rate, and distributed training. ```bash # Train L2Face on 256x256 audio dataset python train.py \ --model l2face \ --dataset_mode l2faceaudio \ --img_size 256 \ --batch_size 8 \ --lr 0.0002 \ --niter 200 \ --feature_path /data/features \ --checkpoints_dir ./checkpoints \ --distributed ``` -------------------------------- ### Example Usage of Global Configuration Source: https://github.com/holasyb918/heygem-linux-python-hack/blob/main/_autodocs/07_types_and_configuration.md Demonstrates creating directories based on configuration and checking for required files like watermarks. ```python from y_utils.config import GlobalConfig import os config = GlobalConfig.instance() # Create directories if they don't exist os.makedirs(config.result_dir, exist_ok=True) os.makedirs(config.temp_dir, exist_ok=True) # Verify watermark files exist if not os.path.exists(config.watermark_path): raise FileNotFoundError(f"Watermark not found: {config.watermark_path}") # Use in video processing task_result_dir = os.path.join(config.result_dir, task_id) os.makedirs(task_result_dir, exist_ok=True) ``` -------------------------------- ### Install Project Dependencies Source: https://github.com/holasyb918/heygem-linux-python-hack/blob/main/README.md Install project dependencies using pip. It is recommended to observe error messages and install dependencies individually if a full requirements.txt installation fails. Consider using autodl environments if facing setup issues. ```bash # Directly installing the entire requirements.txt may not succeed. It is more recommended to run the code, observe error messages, and then try installing based on the errors and requirements. # pip install -r requirements.txt ``` -------------------------------- ### Install Gradio Source: https://github.com/holasyb918/heygem-linux-python-hack/blob/main/_autodocs/09_quick_reference.md Install Gradio for creating user interfaces for machine learning models. This is used for the web UI. ```bash pip install gradio ``` -------------------------------- ### Run Main Application Source: https://github.com/holasyb918/heygem-linux-python-hack/blob/main/README_en.md Executes the main Python script to run the application with default provided examples. ```bash python run.py ``` -------------------------------- ### Install Project Dependencies Source: https://github.com/holasyb918/heygem-linux-python-hack/blob/main/README_en.md Installs the necessary Python packages for the project. Ensure Python 3.8 is installed beforehand. ```bash pip install -r requirements.txt ``` -------------------------------- ### Command-Line Testing Example Source: https://github.com/holasyb918/heygem-linux-python-hack/blob/main/_autodocs/06_neural_network_models.md Example command for testing a trained model, specifying the model, dataset mode, image size, and the iteration to load from checkpoints. ```bash # Test trained model python test.py \ --model l2face \ --dataset_mode l2faceaudio \ --img_size 256 \ --load_iter 200 \ --feature_path /data/features ``` -------------------------------- ### L2FaceAudioDataset Example Usage Source: https://github.com/holasyb918/heygem-linux-python-hack/blob/main/_autodocs/03_face_processing_pipeline.md Demonstrates how to create and iterate over the L2FaceAudioDataset using a dummy options object and the create_dataset utility. ```python from landmark2face_wy.data import create_dataset opt = type('Options', (), { 'dataset_mode': 'l2faceaudio', 'img_size': 256, 'name': 'identity_train', 'feature_path': '/data/features', 'audio_feature': 'mfcc', 'batch_size': 4, 'num_threads': 4, 'max_dataset_size': 10000, 'distributed': False, 'serial_batches': False, })() dataset = create_dataset(opt, mode='train') for batch in dataset: print(batch['A'].shape) # (4, 3, 256, 256) print(batch['A_label'].shape) # (4, 1, 256, 256) ``` -------------------------------- ### Launch Web UI Source: https://github.com/holasyb918/heygem-linux-python-hack/blob/main/_autodocs/09_quick_reference.md Starts the web UI application. Access the UI by opening http://localhost:7860 in a web browser after execution. ```bash python app.py # Then open http://localhost:7860 in browser ``` -------------------------------- ### Launch Gradio Web UI with Custom Configuration Source: https://github.com/holasyb918/heygem-linux-python-hack/blob/main/_autodocs/11_integration_examples.md This Python script sets up a Gradio web interface for video processing. It allows users to upload audio and video files and configure custom output and temporary directories. Ensure Gradio is installed (`pip install gradio`). ```python import os import sys from y_utils.config import GlobalConfig from app import VideoProcessor import gradio as gr # Configure output directories config = GlobalConfig.instance() config.result_dir = os.path.abspath('custom_output') config.temp_dir = os.path.abspath('custom_temp') # Create directories os.makedirs(config.result_dir, exist_ok=True) os.makedirs(config.temp_dir, exist_ok=True) # Initialize processor processor = VideoProcessor() # Create Gradio interface inputs = [ gr.File(label="Audio (WAV format)"), gr.File(label="Video (MP4 format)"), gr.Checkbox(label="Add Watermark"), gr.Checkbox(label="Add Digital Auth"), ] outputs = gr.Video(label="Generated Video") demo = gr.Interface( fn=processor.process_video, inputs=inputs, outputs=outputs, title="HeyGem Digital Human Generator", description="Upload audio and video files to generate digital human videos", examples=[ ["example/audio.wav", "example/video.mp4", False, False], ], ) if __name__ == "__main__": demo.queue().launch(server_name="0.0.0.0", server_port=7860) ``` -------------------------------- ### Install Project Dependencies Source: https://github.com/holasyb918/heygem-linux-python-hack/blob/main/_autodocs/08_error_handling_and_troubleshooting.md Installs all required Python packages from a requirements file or individually. Use this when encountering ImportError for missing packages. ```bash pip install -r requirements.txt pip install typeguard pip install torch torchvision pip install opencv-python pip install gradio pip install flask python -c "from typeguard import check_argument_types; print('OK')" ``` -------------------------------- ### Install OpenCV Source: https://github.com/holasyb918/heygem-linux-python-hack/blob/main/_autodocs/09_quick_reference.md Install the OpenCV library for computer vision tasks. This is required for image and video processing. ```bash pip install opencv-python ``` -------------------------------- ### Example: Result Queue Usage Source: https://github.com/holasyb918/heygem-linux-python-hack/blob/main/_autodocs/07_types_and_configuration.md Demonstrates how to put success and failure messages into the result queue and how to retrieve and interpret them. ```python import queue result_queue = queue.Queue() # Success result_queue.put([True, "/output/result/task_1-r.mp4"]) # Failure result_queue.put([False, "Frame processing error: corrupted frame batch"]) # Check result success, content = result_queue.get() if success: print(f"Video generated: {content}") else: print(f"Error: {content}") ``` -------------------------------- ### Example: Output Frame Queue Usage Source: https://github.com/holasyb918/heygem-linux-python-hack/blob/main/_autodocs/07_types_and_configuration.md Illustrates how to populate the output frame queue with normal frames, a completion signal, and an error signal. ```python import numpy as np import queue frame_queue = queue.Queue() # Add normal frames frame1 = np.random.randint(0, 256, (720, 1280, 3), dtype=np.uint8) frame2 = np.random.randint(0, 256, (720, 1280, 3), dtype=np.uint8) frame_queue.put((False, "ok", [frame1, frame2])) # Signal completion frame_queue.put((True, "done", None)) # Signal error frame_queue.put((False, "Out of memory", None)) ``` -------------------------------- ### Clone Repository and Download Assets Source: https://github.com/holasyb918/heygem-linux-python-hack/blob/main/README_en.md Clones the project repository and executes a download script for necessary assets. This is part of the initial setup. ```bash git clone https://github.com/Holasyb918/HeyGem-Linux-Python-Hack cd HeyGem-Linux-Python-Hack bash download.sh ``` -------------------------------- ### Load Audio File Source: https://github.com/holasyb918/heygem-linux-python-hack/blob/main/_autodocs/05_audio_processing.md Loads an audio file using the soundfile library. Ensure the 'soundfile' library is installed. ```python import soundfile as sf audio, sr = sf.read('input.wav') # Returns (samples, sr) ``` -------------------------------- ### Install ONNX Runtime GPU Source: https://github.com/holasyb918/heygem-linux-python-hack/blob/main/_autodocs/09_quick_reference.md Install the GPU-enabled version of ONNX Runtime. Ensure your CUDA and cuDNN versions are compatible. ```bash pip install onnxruntime-gpu==1.16.0 ``` -------------------------------- ### Test CUDA Setup Source: https://github.com/holasyb918/heygem-linux-python-hack/blob/main/_autodocs/11_integration_examples.md Performs a comprehensive test of the CUDA setup using PyTorch and ONNX Runtime. It checks for CUDA availability, device information, and ONNX Runtime's GPU provider status. ```python import torch import onnxruntime def test_cuda_setup(): """Comprehensive CUDA validation.""" print("=== CUDA Setup Test ===") # PyTorch print(f"PyTorch CUDA available: {torch.cuda.is_available()}") if torch.cuda.is_available(): print(f"CUDA device: {torch.cuda.get_device_name(0)}") print(f"CUDA capability: {torch.cuda.get_device_capability(0)}") # ONNX Runtime print(f"ONNX Runtime version: {onnxruntime.__version__}") providers = onnxruntime.get_available_providers() print(f"Available providers: {providers}") # Simple inference test from check_env.check_onnx_cuda import check_gpu_usage is_cuda, session = check_gpu_usage() print(f"ONNX Runtime using GPU: {is_cuda}") return is_cuda and torch.cuda.is_available() if __name__ == "__main__": success = test_cuda_setup() print(f"\n{'✓' if success else '✗'} CUDA setup is {'correct' if success else 'problematic'}") ``` -------------------------------- ### main() Source: https://github.com/holasyb918/heygem-linux-python-hack/blob/main/_autodocs/02_entry_points.md The main execution function for the batch processing workflow. It parses arguments, initializes the processing task, and starts the video generation process. ```APIDOC ## Function: `main()` ### Description Main execution function for batch processing workflow. ### Behavior: 1. Parses command-line arguments using `get_args()` 2. Validates audio and video paths; falls back to examples if missing 3. Clears `sys.argv` (for service initialization) 4. Initializes `TransDhTask()` from `service.trans_dh_service` 5. Sleeps 10 seconds (async initialization workaround) 6. Calls `task.work(audio_url, video_url, code="1004", 0, 0, 0, 0)` 7. Does **not** explicitly exit or return results; video is written to disk asynchronously ### Side Effects: - Generates output video in configured `result_dir` - Prints progress and completion status to console and logs ### Example: ```python # Direct invocation: if __name__ == "__main__": main() # Usage: # python run.py # python run.py --audio_path data/speech.wav --video_path data/face.mp4 ``` ### Source: `run.py:172-192` ``` -------------------------------- ### Configure CUDA Environment Variables Source: https://github.com/holasyb918/heygem-linux-python-hack/blob/main/_autodocs/07_types_and_configuration.md Sets up environment variables for CUDA and cuDNN. Ensure these paths are correct for your system's CUDA installation. ```bash export CUDA_HOME=/usr/local/cuda export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH export PATH=/usr/local/cuda/bin:$PATH ``` -------------------------------- ### Verify FFmpeg Availability Source: https://github.com/holasyb918/heygem-linux-python-hack/blob/main/_autodocs/08_error_handling_and_troubleshooting.md Checks if FFmpeg is installed and accessible in the system's PATH by running `ffmpeg -version`. ```bash # Test FFmpeg availability ffmpeg -version ``` -------------------------------- ### CustomDatasetDataLoader __iter__ Example Source: https://github.com/holasyb918/heygem-linux-python-hack/blob/main/_autodocs/03_face_processing_pipeline.md Demonstrates the usage of the __iter__ method for CustomDatasetDataLoader, showing how to iterate through batches of data. Each batch is a dictionary containing various data components. ```python dataloader = create_dataset(opt, mode='train') for batch in dataloader: # batch is dict with keys: A, A_label, B, B_label, mask_B pass ``` -------------------------------- ### Catch Custom Project Errors Source: https://github.com/holasyb918/heygem-linux-python-hack/blob/main/_autodocs/07_types_and_configuration.md Demonstrates how to catch `CustomError` exceptions. Includes example handling for specific error messages like 'face' or 'memory'. ```python from h_utils.custom import CustomError try: task.work(audio_path, video_path, code, 0, 0, 0, 0) except CustomError as e: logger.error(f"Task failed: {e}") # Handle specific error if "face" in str(e).lower(): print("No faces detected; please check video content") elif "memory" in str(e).lower(): print("GPU out of memory; reduce batch size or use CPU") ``` -------------------------------- ### End-to-End Audio Processing Pipeline Source: https://github.com/holasyb918/heygem-linux-python-hack/blob/main/_autodocs/05_audio_processing.md A complete example demonstrating the audio processing workflow from loading an audio file to preparing a PyTorch tensor for a neural network. Includes MFCC extraction, CMVN normalization, padding, and tensor conversion. ```python from wenet.utils.cmvn import load_cmvn from wenet.tools._extract_feats import extract_mfcc import numpy as np import torch # 1. Load audio audio_path = 'speech.wav' import soundfile as sf audio, sr = sf.read(audio_path) # 2. Extract MFCC mfcc = extract_mfcc(audio, sample_rate=sr, n_mfcc=13) # Shape: (T, 13) # 3. Normalize means, inv_stds = load_cmvn('cmvn.json', is_json=True) normalized = (mfcc - means) * inv_stds # 4. Pad to network input size padded = np.zeros((256, 256), dtype=np.float32) padded[:normalized.shape[0], :normalized.shape[1]] = normalized # 5. Convert to tensor audio_tensor = torch.from_numpy(padded).unsqueeze(0).unsqueeze(0) # Shape: (1, 1, 256, 256) for batch_size=1 # Ready for L2FaceAudio network input ``` -------------------------------- ### Access Global Configuration Instance Source: https://github.com/holasyb918/heygem-linux-python-hack/blob/main/_autodocs/07_types_and_configuration.md Use GlobalConfig.instance() to get the singleton configuration object. Access properties like result_dir and temp_dir. ```python from y_utils.config import GlobalConfig config = GlobalConfig.instance() result_dir = config.result_dir temp_dir = config.temp_dir watermark_path = config.watermark_path digital_auth_path = config.digital_auth_path ``` -------------------------------- ### Parse Command-Line Arguments Source: https://github.com/holasyb918/heygem-linux-python-hack/blob/main/_autodocs/02_entry_points.md Parses command-line arguments for the video processing task. Returns an object with `.audio_path` and `.video_path` attributes, defaulting to example files if they don't exist. ```python from run import get_args args = get_args() print(args.audio_path) # 'example/audio.wav' print(args.video_path) # 'example/video.mp4' ``` -------------------------------- ### Custom Loss Function Usage Source: https://github.com/holasyb918/heygem-linux-python-hack/blob/main/_autodocs/06_neural_network_models.md Example demonstrating the use of custom perceptual and adversarial loss functions for video generation. These losses help in improving image quality and realism. ```python from landmark2face_wy.loss import PerceptualLoss, AdversarialLoss import torch perceptual_loss = PerceptualLoss() adversarial_loss = AdversarialLoss() fake_images = generator(audio, landmarks) target_images = batch['B'] loss_perc = perceptual_loss(fake_images, target_images) loss_adv = adversarial_loss(discriminator(fake_images)) total_loss = loss_perc + 0.1 * loss_adv ``` -------------------------------- ### Provide System and Environment Diagnostics Source: https://github.com/holasyb918/heygem-linux-python-hack/blob/main/_autodocs/08_error_handling_and_troubleshooting.md Collects and displays essential diagnostic information about the Python environment, GPU setup, and recent errors. This information is crucial when reporting issues. ```bash python --version ``` ```bash nvidia-smi ``` ```bash python check_env/check_onnx_cuda.py ``` ```bash tail -20 log/error.log ``` -------------------------------- ### Custom Model Implementation Source: https://github.com/holasyb918/heygem-linux-python-hack/blob/main/_autodocs/06_neural_network_models.md Example of implementing a custom model by inheriting from BaseModel. Defines essential attributes like loss_names, model_names, visual_names, and optimizers, and implements required methods for training. ```python class CustomModel(BaseModel): def __init__(self, opt): BaseModel.__init__(self, opt) # Define lists below self.loss_names = ['loss_g', 'loss_d'] self.model_names = ['G', 'D'] self.visual_names = ['input', 'output'] self.optimizers = [optimizer_g, optimizer_d] def set_input(self, input): # Unpack data dictionary self.input = input['A'] self.label = input['A_label'] def forward(self): # Produce outputs self.output = self.netG(self.input, self.label) def optimize_parameters(self): # Calculate losses and update weights loss_g = self.criterion(self.output, self.target) self.optimizer.zero_grad() loss_g.backward() self.optimizer.step() @staticmethod def modify_commandline_options(parser, is_train): # Add custom options parser.add_argument('--custom_option', type=int, default=100) return parser ``` -------------------------------- ### Run with Custom Files (CLI) Source: https://github.com/holasyb918/heygem-linux-python-hack/blob/main/_autodocs/09_quick_reference.md Execute the main script specifying custom audio and video file paths. ```bash # Custom audio and video python run.py --audio_path data/speech.wav --video_path data/person.mp4 ``` -------------------------------- ### Install Typeguard Package Source: https://github.com/holasyb918/heygem-linux-python-hack/blob/main/README_tts_f2f.MD Install the 'typeguard' Python package using pip to resolve ImportError related to 'check_argument_types'. This is a common dependency issue. ```bash pip install typeguard ``` -------------------------------- ### Run Gradio Interface Source: https://github.com/holasyb918/heygem-linux-python-hack/blob/main/README_en.md Launches the Gradio web interface for the application. Wait for the processor initialization to complete before use. ```bash python app.py ``` -------------------------------- ### Import Configuration and Logging Utilities Source: https://github.com/holasyb918/heygem-linux-python-hack/blob/main/_autodocs/10_api_index_and_glossary.md Import classes for global configuration and custom error handling, along with a logger. ```python from y_utils.config import GlobalConfig from y_utils.logger import logger from h_utils.custom import CustomError ``` -------------------------------- ### Parse Training Options Source: https://github.com/holasyb918/heygem-linux-python-hack/blob/main/_autodocs/06_neural_network_models.md Parses command-line arguments and loads configuration from files to set up training options. This is typically the first step in a training script. ```python from landmark2face_wy.options.train_options import TrainOptions opt = TrainOptions().parse() # Parses command-line args + loads from config files print(opt.model) # 'l2face' print(opt.batch_size) # 4 print(opt.img_size) # 256 ``` -------------------------------- ### Complete Training Pipeline Source: https://github.com/holasyb918/heygem-linux-python-hack/blob/main/_autodocs/06_neural_network_models.md Illustrates the complete training workflow, including loading options, creating the dataset and model, and iterating through the training loop to update parameters and save checkpoints. ```python from landmark2face_wy.options.train_options import TrainOptions from landmark2face_wy.data import create_dataset from landmark2face_wy.models import create_model # 1. Load options opt = TrainOptions().parse() # Command: python train.py --model l2face --dataset_mode l2faceaudio --batch_size 8 # 2. Create dataset dataset = create_dataset(opt, mode='train') dataset_size = len(dataset) print(f'Dataset size: {dataset_size}') # 3. Create model model = create_model(opt) # 4. Training loop for epoch in range(1, opt.niter + 1): epoch_iter = 0 for i, data in enumerate(dataset): epoch_iter += 1 # Forward pass model.set_input(data) model.forward() # Update weights model.optimize_parameters() # Log losses if epoch_iter % 100 == 0: losses = model.get_current_losses() print(f'Epoch {epoch}, Iter {epoch_iter}: {losses}') # Save checkpoint if epoch % 10 == 0: model.save_networks(epoch) ``` -------------------------------- ### Initialize Global Configuration Paths Source: https://github.com/holasyb918/heygem-linux-python-hack/blob/main/_autodocs/07_types_and_configuration.md Set absolute paths for configuration properties like result_dir, temp_dir, and watermark_path. Ensure execution from the project root. ```python import os from y_utils.config import GlobalConfig # Set environment or call GlobalConfig methods (not directly exposed) config = GlobalConfig.instance() # Paths should be absolute or properly resolved relative to working directory config.result_dir = os.path.abspath('result') config.temp_dir = os.path.abspath('temp') config.watermark_path = os.path.abspath('assets/watermark.png') config.digital_auth_path = os.path.abspath('assets/digital_badge.png') ``` -------------------------------- ### Initialize ModelBase for ONNX, TensorRT, or Encrypted Models Source: https://github.com/holasyb918/heygem-linux-python-hack/blob/main/_autodocs/04_model_loading_and_inference.md Demonstrates initializing the `ModelBase` class with configurations for ONNX, TensorRT engines, and encrypted ONNX models. ```python from model_lib.model_base import ModelBase # ONNX model configuration model_config = { 'model_path': '/models/face_detect.onnx', 'input_dynamic_shape': {'input': [1, 3, 640, 640]}, } model = ModelBase(model_config, provider='cuda') # TensorRT engine trt_config = { 'model_path': '/models/generator.engine', 'trt_wrapper_self': True, } model = ModelBase(trt_config, provider='cuda') # Encrypted ONNX encrypted_config = { 'model_path': '/models/private.onnx', 'encrypt': 'my_secret_key', 'picklable': False, } model = ModelBase(encrypted_config, provider='cuda') ``` -------------------------------- ### VideoProcessor Service Initialization Method Source: https://github.com/holasyb918/heygem-linux-python-hack/blob/main/_autodocs/02_entry_points.md Handles the asynchronous initialization of the transformation service within the VideoProcessor. It includes a delay for service spin-up and sets an initialization flag. ```python def _initialize_service(self) -> None ``` -------------------------------- ### Generate Video from Audio and Video (Python) Source: https://github.com/holasyb918/heygem-linux-python-hack/blob/main/_autodocs/11_integration_examples.md This Python script demonstrates how to programmatically generate a video using the TransDhTask service. It includes initialization, task processing, and result retrieval. A 10-second sleep is included after initialization, and a 60-second sleep is used to wait for task completion. ```python import sys import os import time from service.trans_dh_service import TransDhTask os.chdir('/workspace/heygem-linux-python-hack') # Initialize service task = TransDhTask() time.sleep(10) # Process video audio_path = 'data/speech.wav' video_path = 'data/face.mp4' task_id = 'test_task_1' task.work(audio_path, video_path, task_id, 0, 0, 0, 0) # Wait for completion time.sleep(60) # Retrieve result result = task.task_dic.get(task_id) if result: output_path = result[2] print(f"Generated video: {output_path}") print(f"Absolute path: {os.path.abspath(output_path)}") else: print("Processing failed or not completed") ``` -------------------------------- ### Import Dataset Utilities Source: https://github.com/holasyb918/heygem-linux-python-hack/blob/main/_autodocs/10_api_index_and_glossary.md Import functions for creating and finding datasets, specifically for L2FaceAudioDataset. ```python from landmark2face_wy.data import create_dataset, find_dataset_using_name from landmark2face_wy.data.l2faceaudio_dataset import L2FaceAudioDataset ``` -------------------------------- ### Get Model Option Setter Source: https://github.com/holasyb918/heygem-linux-python-hack/blob/main/_autodocs/06_neural_network_models.md Retrieves a callable that modifies command-line options for a specific model. This is useful for adding model-specific arguments to a parser. ```python def get_option_setter(model_name: str) -> callable: # ... implementation details ... pass ``` ```python from landmark2face_wy.models import get_option_setter import argparse opt_setter = get_option_setter("l2face") parser = argparse.ArgumentParser() # opt_setter modifies parser with model-specific options modified_parser = opt_setter(parser) ``` -------------------------------- ### List Model Files in Directory Source: https://github.com/holasyb918/heygem-linux-python-hack/blob/main/_autodocs/08_error_handling_and_troubleshooting.md Lists the ONNX model files present in the specified directory. Use this to confirm that models have been downloaded and are accessible. ```bash ls -lh face_detect_utils/resources/ ``` -------------------------------- ### Verify Input File and Model Paths Source: https://github.com/holasyb918/heygem-linux-python-hack/blob/main/_autodocs/08_error_handling_and_troubleshooting.md Checks the existence and permissions of input audio/video files and ONNX model files. Ensures that the program can access necessary resources. ```bash ls -lh example/audio.wav ``` ```bash ls -lh example/video.mp4 ``` ```bash ls -lh face_detect_utils/resources/*.onnx ``` ```bash ls -lh *.onnx ``` -------------------------------- ### Create and Process Dataset Source: https://github.com/holasyb918/heygem-linux-python-hack/blob/main/_autodocs/09_quick_reference.md Creates a dataset using create_dataset and iterates through the first 10 batches, accessing image and audio data. ```python from landmark2face_wy.data import create_dataset opt = type('Options', (), { 'dataset_mode': 'l2faceaudio', 'img_size': 256, 'name': 'identity_train', 'feature_path': '/data/features', 'audio_feature': 'mfcc', 'batch_size': 4, 'num_threads': 4, 'max_dataset_size': 50000, 'distributed': False, 'serial_batches': False, })() dataset = create_dataset(opt, mode='train') for i, batch in enumerate(dataset): if i >= 10: # Process first 10 batches break images = batch['B'] # (4, 3, 256, 256) audio = batch['A_label'] # (4, 1, 256, 256) ``` -------------------------------- ### Find Dataset Using Name Example Source: https://github.com/holasyb918/heygem-linux-python-hack/blob/main/_autodocs/03_face_processing_pipeline.md Demonstrates how to use the find_dataset_using_name function to retrieve a dataset class by its name. This function is case-insensitive and removes underscores from the dataset name. ```python from landmark2face_wy.data import find_dataset_using_name DatasetClass = find_dataset_using_name("l2faceaudio") # Returns: L2FaceAudioDataset class ``` -------------------------------- ### Initialize and Use Service Task Source: https://github.com/holasyb918/heygem-linux-python-hack/blob/main/_autodocs/09_quick_reference.md Initializes the TransDhTask service, waits for it to be ready, and then processes a video file. Results are retrieved from a task dictionary. ```python from service.trans_dh_service import TransDhTask import time # Create task task = TransDhTask() # Wait for initialization time.sleep(10) # Process video task.work( audio_path="example/audio.wav", video_path="example/video.mp4", code="task_1", 0, 0, 0, 0 ) # Retrieve result (check periodically) result = task.task_dic.get("task_1") if result: output_path = result[2] print(f"Generated: {output_path}") ``` -------------------------------- ### Launch Gradio Video Processing Interface Source: https://github.com/holasyb918/heygem-linux-python-hack/blob/main/_autodocs/02_entry_points.md Sets up and launches a Gradio interface for video processing. Use this to provide a web UI for the video generation functionality. Ensure the VideoProcessor class is defined and imported. ```python if __name__ == "__main__": processor = VideoProcessor() demo = gr.Interface( fn=processor.process_video, inputs=[ gr.File(label="上传音频文件/upload audio file"), gr.File(label="上传视频文件/upload video file"), ], outputs=gr.Video(label="生成的视频/Generated video"), title="数字人视频生成/Digital Human Video Generation", description="上传音频和视频文件,即可生成数字人视频。/Upload audio and video files to generate digital human videos.", ) demo.queue().launch() ``` -------------------------------- ### Increase Gradio Initialization Timeout Source: https://github.com/holasyb918/heygem-linux-python-hack/blob/main/_autodocs/08_error_handling_and_troubleshooting.md Adds a delay in the Python script to increase the Gradio service initialization timeout. Use this if the Gradio interface fails to start due to timeouts. ```python # Increase initialization timeout in code time.sleep(10) # Instead of 5 ``` -------------------------------- ### Run Main Application with Custom Data Source: https://github.com/holasyb918/heygem-linux-python-hack/blob/main/README_en.md Executes the main Python script, allowing you to specify custom audio and video files using relative paths. ```bash python run.py --audio_path example/audio.wav --video_path example/video.mp4 ``` -------------------------------- ### Main Batch Processing Function Source: https://github.com/holasyb918/heygem-linux-python-hack/blob/main/_autodocs/02_entry_points.md The main execution function for batch processing. It parses arguments, initializes services, and starts the video processing task. Designed for command-line execution. ```python def main() -> None ``` ```python # Direct invocation: if __name__ == "__main__": main() # Usage: # python run.py # python run.py --audio_path data/speech.wav --video_path data/face.mp4 ``` -------------------------------- ### get_args() Source: https://github.com/holasyb918/heygem-linux-python-hack/blob/main/_autodocs/02_entry_points.md Parses command-line arguments for the video processing task. It sets up expected arguments for audio and video file paths. ```APIDOC ## get_args() ### Description Parses command-line arguments for the video processing task, providing default paths for input audio and video files. ### Method Python function call ### Parameters #### Path Parameters - **audio_path** (str) - Optional - Relative path to input audio file (WAV format). Defaults to "example/audio.wav". - **video_path** (str) - Optional - Relative path to input video file (MP4 format). Defaults to "example/video.mp4". ### Response #### Success Response - **argparse.Namespace** - An object with `.audio_path` and `.video_path` attributes. ### Request Example ```python from run import get_args args = get_args() print(args.audio_path) print(args.video_path) ``` ``` -------------------------------- ### Validate GPU/CUDA Setup Source: https://github.com/holasyb918/heygem-linux-python-hack/blob/main/_autodocs/09_quick_reference.md Executes a Python script to check if ONNX Runtime is successfully utilizing the GPU. Expected output indicates successful GPU usage and the input shape. ```bash python check_env/check_onnx_cuda.py # Should output: # ONNX Runtime is successfully using the GPU. # (1, 3, 640, 640) ``` -------------------------------- ### Minimal Python Integration Source: https://github.com/holasyb918/heygem-linux-python-hack/blob/main/_autodocs/00_README.md This snippet demonstrates a basic integration of the TransDhTask for audio-to-video processing. It includes task initiation, a delay for processing, and retrieving the result. ```python from service.trans_dh_service import TransDhTask import time task = TransDhTask() time.sleep(10) task.work("audio.wav", "video.mp4", "task_1", 0, 0, 0, 0) time.sleep(60) result = task.task_dic.get("task_1") print(f"Output: {result[2]}") ``` -------------------------------- ### Reinstall ONNX Runtime GPU Source: https://github.com/holasyb918/heygem-linux-python-hack/blob/main/_autodocs/08_error_handling_and_troubleshooting.md Uninstalls existing ONNX Runtime packages and installs a specific version. Use this when encountering ONNX Runtime initialization errors due to version mismatches. ```bash pip uninstall onnxruntime-gpu onnxruntime -y pip install onnxruntime-gpu==1.16.0 ``` -------------------------------- ### Initialize and Run Inference with ONNXModel Source: https://github.com/holasyb918/heygem-linux-python-hack/blob/main/_autodocs/04_model_loading_and_inference.md Shows how to instantiate `ONNXModel` for ONNX inference and perform a forward pass. Ensure input data matches the model's expected shape. ```python from model_lib.base_wrapper import ONNXModel import numpy as np model = ONNXModel('/path/to/model.onnx', provider='cuda') # Prepare input (must match model's expected shape) input_data = np.random.randn(1, 3, 640, 640).astype(np.float32) # Run inference output = model.run(input_data) # Output shape depends on model ``` -------------------------------- ### Import Model Utilities Source: https://github.com/holasyb918/heygem-linux-python-hack/blob/main/_autodocs/10_api_index_and_glossary.md Import functions for creating and finding models, along with training options. ```python from landmark2face_wy.models import create_model, find_model_using_name from landmark2face_wy.options.train_options import TrainOptions ``` -------------------------------- ### Initialize and Run Digital Human Transformation Task Source: https://github.com/holasyb918/heygem-linux-python-hack/blob/main/_autodocs/04_model_loading_and_inference.md Instantiates the TransDhTask and initiates an asynchronous video transformation process. The actual processing occurs in a background thread. ```python from service.trans_dh_service import TransDhTask task = service.trans_dh_service.TransDhTask() task.work( audio_path="example/audio.wav", video_path="example/video.mp4", code="task_123", 0, 0, 0, 0 ) # Wait for completion (check periodically) import time time.sleep(30) result = task.task_dic.get("task_123") if result: output_path = result[2] # Extract path from tuple print(f"Generated: {output_path}") ``` -------------------------------- ### Process Dataset Batch Source: https://github.com/holasyb918/heygem-linux-python-hack/blob/main/_autodocs/09_quick_reference.md Creates a dataset using specified options and iterates through the first batch to access image and audio features. Ensure dataset path and name are correctly configured. ```python from landmark2face_wy.data import create_dataset import torch opt = type('Options', (), { 'dataset_mode': 'l2faceaudio', 'img_size': 256, 'batch_size': 4, 'num_threads': 4, 'max_dataset_size': 100, 'distributed': False, 'serial_batches': False, 'feature_path': '/data', 'name': 'test_train', 'audio_feature': 'mfcc', })() dataset = create_dataset(opt, mode='train') for batch in dataset: img = batch['B'] # (4, 3, 256, 256) audio = batch['A_label'] # (4, 1, 256, 256) break # First batch only ``` -------------------------------- ### L2FaceAudioDataset Initialization Source: https://github.com/holasyb918/heygem-linux-python-hack/blob/main/_autodocs/03_face_processing_pipeline.md Initializes the L2FaceAudioDataset for landmark-to-face synthesis with audio features. It requires an options object and a mode (train/test). ```python class L2FaceAudioDataset(BaseDataset): def __init__(self, opt, mode: str = 'train') -> None ``` -------------------------------- ### Verify ONNX Model Files Source: https://github.com/holasyb918/heygem-linux-python-hack/blob/main/_autodocs/08_error_handling_and_troubleshooting.md Lists the ONNX model files in the specified directories to verify their existence and permissions. Use this to troubleshoot model loading errors. ```bash # Verify model files exist ls -lh face_detect_utils/resources/*.onnx ls -lh *.onnx ``` -------------------------------- ### Data Flow Diagram Source: https://github.com/holasyb918/heygem-linux-python-hack/blob/main/_autodocs/00_README.md Illustrates the data flow within the service, from input files to the final output video, detailing the intermediate processing steps. ```text Audio File + Video File ↓ [Service: TransDhTask] ├─ Face Detection ├─ Audio Feature Extraction ├─ Landmark Extraction ├─ Frame Generation └─ Video Encoding ↓ Output Video (with optional overlays) ``` -------------------------------- ### Check and Create Output Directories Source: https://github.com/holasyb918/heygem-linux-python-hack/blob/main/_autodocs/08_error_handling_and_troubleshooting.md Ensures that output directories for results, temporary files, and logs exist. Creates them if they are missing. ```bash mkdir -p result temp log ``` ```bash ls -ld result temp log ``` -------------------------------- ### VideoProcessor Class Initialization Source: https://github.com/holasyb918/heygem-linux-python-hack/blob/main/_autodocs/02_entry_points.md Initializes the VideoProcessor, which wraps the transformation service lifecycle. The constructor includes an asynchronous initialization step for the service. ```python class VideoProcessor: def __init__(self) -> None ``` ```python processor = VideoProcessor() # Waits inside constructor for _initialize_service() call ``` -------------------------------- ### Load and Inspect Dataset Source: https://github.com/holasyb918/heygem-linux-python-hack/blob/main/_autodocs/11_integration_examples.md Loads a dataset using `create_dataset` and inspects the first few batches for debugging. It shows how to access tensor shapes, data types, and min/max values, and visualizes the first sample image. ```python from landmark2face_wy.data import create_dataset import torch import os # Configuration opt = type('Options', (), { 'dataset_mode': 'l2faceaudio', 'img_size': 256, 'name': 'identity_train', 'feature_path': '/data/features', 'audio_feature': 'mfcc', 'batch_size': 4, 'num_threads': 4, 'max_dataset_size': 1000, 'distributed': False, 'serial_batches': False, })() # Load dataset print("Loading dataset...") dataset = create_dataset(opt, mode='train') print(f"Dataset size: {len(dataset)}") # Process first batch for batch_idx, batch in enumerate(dataset): if batch_idx >= 5: # Process first 5 batches only break print(f"\n=== Batch {batch_idx} ===") for key, tensor in batch.items(): if isinstance(tensor, torch.Tensor): print(f"{key}: shape={tensor.shape}, dtype={tensor.dtype}, " f"min={tensor.min():.3f}, max={tensor.max():.3f}") # Visualize first sample in batch if batch_idx == 0: import numpy as np from PIL import Image # Denormalize image ([-1, 1] → [0, 255]) img_tensor = batch['B'][0] # First image in batch img_np = (img_tensor.cpu().numpy().transpose(1, 2, 0) + 1) / 2 * 255 img_np = np.clip(img_np, 0, 255).astype(np.uint8) # Save sample Image.fromarray(img_np).save('sample_batch0_img0.jpg') print("Saved sample image: sample_batch0_img0.jpg") print("\nDataset inspection complete") ``` -------------------------------- ### Clone Project and Download Models Source: https://github.com/holasyb918/heygem-linux-python-hack/blob/main/README_tts_f2f.MD Clone the HeyGem-Linux-Python-Hack repository and download the necessary face-to-face (f2f) model using the provided script. Then, clone the tts-fish-speech repository and download its model using huggingface-cli. ```bash # f2f git clone https://github.com/Holasyb918/HeyGem-Linux-Python-Hack cd HeyGem-Linux-Python-Hack # 下载 f2f 模型 bash download.sh # tts git clone https://github.com/Holasyb918/tts-fish-speech cd tts-fish-speech # 下载 tts 模型 huggingface-cli download fishaudio/fish-speech-1.5 --local-dir checkpoints/fish-speech-1.5/ ``` -------------------------------- ### Access and Use Global Configuration Source: https://github.com/holasyb918/heygem-linux-python-hack/blob/main/_autodocs/09_quick_reference.md Retrieves the singleton GlobalConfig instance and uses its paths to create output and temporary directories. ```python from y_utils.config import GlobalConfig import os config = GlobalConfig.instance() # Create output directories os.makedirs(config.result_dir, exist_ok=True) os.makedirs(config.temp_dir, exist_ok=True) # Use paths task_dir = os.path.join(config.result_dir, 'task_123') ``` -------------------------------- ### Create Model Instance Source: https://github.com/holasyb918/heygem-linux-python-hack/blob/main/_autodocs/06_neural_network_models.md Factory function to instantiate a model based on configuration options. It uses `find_model_using_name` internally and requires an options object with a 'model' field. ```python def create_model(opt) -> BaseModel: # ... implementation details ... pass ``` ```python from landmark2face_wy.models import create_model opt = type('Options', (), {'model': 'l2face'})() model = create_model(opt) # Output: "model [L2FaceModel] was created" ``` -------------------------------- ### Verify Input Video Files with FFmpeg Source: https://github.com/holasyb918/heygem-linux-python-hack/blob/main/_autodocs/08_error_handling_and_troubleshooting.md Tests the integrity and format of input audio and video files using FFmpeg. ```bash # Verify input files ffmpeg -i audio.wav ffmpeg -i video.mp4 ``` -------------------------------- ### Import ONNX Runtime Source: https://github.com/holasyb918/heygem-linux-python-hack/blob/main/_autodocs/10_api_index_and_glossary.md Import the ONNX Runtime library for direct model inference. ```python import onnxruntime ``` -------------------------------- ### Accessing Task Results Source: https://github.com/holasyb918/heygem-linux-python-hack/blob/main/_autodocs/07_types_and_configuration.md Shows how to instantiate `TransDhTask`, initiate a work task, wait for completion, and retrieve the output path from the `task_dic`. ```python from service.trans_dh_service import TransDhTask task = TransDhTask() task.work("audio.wav", "video.mp4", code="task_123", 0, 0, 0, 0) # Wait for completion import time time.sleep(30) # Retrieve result result_tuple = task.task_dic.get("task_123") if result_tuple: output_path = result_tuple[2] print(f"Generated: {output_path}") ``` -------------------------------- ### Generate Video from Audio and Video Files Source: https://github.com/holasyb918/heygem-linux-python-hack/blob/main/_autodocs/09_quick_reference.md Initiates a video generation task using audio and video inputs. Requires a 10-second initial delay and a subsequent 30-second wait for task completion. ```python from service.trans_dh_service import TransDhTask import time task = TransDhTask() time.sleep(10) task.work("audio.wav", "video.mp4", "task_1", 0, 0, 0, 0) time.sleep(30) result = task.task_dic.get("task_1") print(f"Output: {result[2]}") ``` -------------------------------- ### L2FaceAudio512Dataset Initialization Source: https://github.com/holasyb918/heygem-linux-python-hack/blob/main/_autodocs/03_face_processing_pipeline.md Initializes the L2FaceAudio512Dataset, a variant of L2FaceAudioDataset that uses 512x512 resolution images. ```python class L2FaceAudio512Dataset(BaseDataset): def __init__(self, opt, mode: str = 'train') -> None ``` -------------------------------- ### Load Model with Configuration Source: https://github.com/holasyb918/heygem-linux-python-hack/blob/main/_autodocs/09_quick_reference.md Loads a model using ModelBase, specifying the model path and input shape. Supports CUDA or CPU providers. ```python from model_lib.model_base import ModelBase model_config = { 'model_path': 'face_detect_utils/resources/scrfd_500m_bnkps_shape640x640.onnx', 'input_dynamic_shape': {'input': [1, 3, 640, 640]}, } model = ModelBase(model_config, provider='cuda') # Or: provider='cpu' for CPU-only ``` -------------------------------- ### Dataset Options Configuration Source: https://github.com/holasyb918/heygem-linux-python-hack/blob/main/_autodocs/09_quick_reference.md Configure dataset parameters such as mode, image size, batch size, and feature paths. ```python opt = type('Options', (), { 'dataset_mode': 'l2faceaudio', 'img_size': 256, 'name': 'identity_name', 'feature_path': '/path/to/features', 'audio_feature': 'mfcc', 'batch_size': 4, 'num_threads': 4, 'max_dataset_size': 50000, 'distributed': False, 'serial_batches': False, })() ``` -------------------------------- ### Directory Structure Overview Source: https://github.com/holasyb918/heygem-linux-python-hack/blob/main/_autodocs/00_README.md This snippet outlines the project's directory structure and the purpose of each markdown file, providing a map to different aspects of the system. ```markdown heygem-linux-python-hack/ ├── 01_architecture_overview.md # System design and components ├── 02_entry_points.md # CLI and web interfaces ├── 03_face_processing_pipeline.md # Face detection, parsing, restoration ├── 04_model_loading_and_inference.md # Model infrastructure ├── 05_audio_processing.md # Audio feature extraction ├── 06_neural_network_models.md # Model definitions and training ├── 07_types_and_configuration.md # Configuration and data types ├── 08_error_handling_and_troubleshooting.md # Error reference ├── 09_quick_reference.md # Quick start guide ├── 10_api_index_and_glossary.md # Complete API reference └── 11_integration_examples.md # Code examples and advanced patterns ``` -------------------------------- ### Python Function Signatures for Entry Points Source: https://github.com/holasyb918/heygem-linux-python-hack/blob/main/_autodocs/10_api_index_and_glossary.md Defines the entry point functions for the application, including argument parsing, video writing, and the main execution flow. ```python def get_args() -> argparse.Namespace def write_video(Queue, str, str, str, str, Queue, int, int, float, int, int) -> None def main() -> None ``` -------------------------------- ### Facereala3dmmexpwenet512Dataset Initialization Source: https://github.com/holasyb918/heygem-linux-python-hack/blob/main/_autodocs/03_face_processing_pipeline.md Initializes the Facereala3dmmexpwenet512Dataset, which combines 3DMM expression coefficients with WeNet audio features. This dataset is designed for multi-modal conditioning in video generation. ```python class Facereala3dmmexpwenet512Dataset(BaseDataset): def __init__(self, opt, mode: str = 'train') -> None ``` -------------------------------- ### Import Audio Feature Extraction Source: https://github.com/holasyb918/heygem-linux-python-hack/blob/main/_autodocs/10_api_index_and_glossary.md Import utilities for loading CMVN statistics and extracting audio features like MFCC. ```python from wenet.utils.cmvn import load_cmvn from wenet.tools._extract_feats import extract_mfcc # Or similar ``` -------------------------------- ### Access Global Config Attributes Source: https://github.com/holasyb918/heygem-linux-python-hack/blob/main/_autodocs/09_quick_reference.md Demonstrates accessing various configuration attributes from the GlobalConfig instance, such as output and temporary directory paths. ```python config = GlobalConfig.instance() config.result_dir # Output directory config.temp_dir # Temporary directory config.watermark_path # Watermark image config.digital_auth_path # Digital auth badge ```