### Stage 1: Recognize Audio/Video CLI Example Source: https://github.com/modelscope/funclip/blob/main/_autodocs/api-reference/command_line.md Example of using the command-line interface to perform stage 1 recognition on an audio or video file. This stage processes the input and saves recognition results. ```bash python funclip/videoclipper.py --stage 1 \ --file examples/video.mp4 \ --output_dir ./output \ --lang zh ``` -------------------------------- ### Audio Input Example Source: https://github.com/modelscope/funclip/blob/main/_autodocs/types.md Demonstrates how to load an audio file using librosa and format it into the expected tuple structure for audio input. ```python import librosa wav, sr = librosa.load('audio.wav', sr=None) audio_input = (sr, wav) # e.g., (48000, array([...])) ``` -------------------------------- ### VideoClipper Argument Setup Source: https://github.com/modelscope/funclip/blob/main/_autodocs/api-reference/argparse_tools.md Defines the complete argument parser setup for the VideoClipper, including a detailed list of all available arguments, their types, requirements, defaults, choices, and descriptions. ```APIDOC ## VideoClipper Argument Setup ### Description This section details the complete argument parser configuration for the VideoClipper, outlining all available command-line arguments. ### Full Argument List | Argument | Type | Required | Default | Choices | Description | |----------|------|----------|---------|---------|-------------| | --stage | int | Yes | — | 1, 2 | Processing stage (1=recognize, 2=clip) | | --file | str | Yes | — | — | Input file path | | --sd_switch | str | No | "no" | "no", "yes" | Enable speaker diarization | | --output_dir | str | No | "./output" | — | Output directory path | | --dest_text | str | No | None | — | Text to clip (# separated) | | --dest_spk | str | No | None | — | Speaker to clip (# separated) | | --start_ost | int | No | 0 | — | Start offset (ms) | | --end_ost | int | No | 0 | — | End offset (ms) | | --output_file | str | No | None | — | Output file path | | --lang | str | No | "zh" | — | Language (zh, en) | ``` -------------------------------- ### Example Usage of Text2SRT Source: https://github.com/modelscope/funclip/blob/main/_autodocs/api-reference/subtitle_utils.md Demonstrates initializing Text2SRT and calling its text() method to get a formatted string. ```python t2s = Text2SRT(['我', '是', 'AI'], [[0, 100], [100, 200], [200, 300]]) print(t2s.text()) # "我是 AI" ``` -------------------------------- ### Batch Processing Example Source: https://github.com/modelscope/funclip/blob/main/_autodocs/DOCUMENTATION_SUMMARY.txt Demonstrates patterns for batch processing multiple files using FunClip. This is efficient for handling large datasets. ```python # Conceptual example for batch processing: # import os # for filename in os.listdir('input_dir'): # if filename.endswith('.wav'): # input_path = os.path.join('input_dir', filename) # output_path = os.path.join('output_dir', filename.replace('.wav', '.srt')) # # Call FunClip processing function here ``` -------------------------------- ### Launching the Web Interface Source: https://github.com/modelscope/funclip/blob/main/_autodocs/module-overview.md Starts a local web server for FunCLIP, allowing users to interact with the recognition and clipping features through a web browser. Specify language and port. ```bash python funclip/launch.py -l zh -m paraformer -p 7860 # Visit localhost:7860 in browser # Upload file → Recognize → Clip → Download ``` -------------------------------- ### Install FunClip Python Requirements Source: https://github.com/modelscope/funclip/blob/main/README.md Clone the FunClip repository and install its Python dependencies using pip. ```shell git clone https://github.com/alibaba-damo-academy/FunClip.git cd FunClip pip install -r ./requirements.txt ``` -------------------------------- ### Hotword Syntax Example Source: https://github.com/modelscope/funclip/blob/main/_autodocs/configuration.md Provides an example of space-separated terms for hotword configuration. This improves recognition accuracy for specific terms. ```text 热词1 热词2 热词3 ``` ```text 云栖大会 阿里巴巴 普惠设计 ``` -------------------------------- ### Install ffmpeg and imagemagick on Ubuntu Source: https://github.com/modelscope/funclip/blob/main/README.md Installs ffmpeg and imagemagick on Ubuntu systems and configures ImageMagick policy for read/write access. ```shell apt-get -y update && apt-get -y install ffmpeg imagemagick sed -i 's/none/read,write/g' /etc/ImageMagick-6/policy.xml ``` -------------------------------- ### Parse Command Line Arguments Source: https://github.com/modelscope/funclip/blob/main/_autodocs/api-reference/command_line.md Example of parsing command-line arguments using the get_parser() function. Demonstrates how to set stage, input file, and output directory. ```python from funclip.videoclipper import get_parser parser = get_parser() args = parser.parse_args([ '--stage', '1', '--file', 'video.mp4', '--output_dir', './output' ]) print(args.stage) # 1 print(args.file) # 'video.mp4' ``` -------------------------------- ### Example Usage of get_commandline_args Source: https://github.com/modelscope/funclip/blob/main/_autodocs/api-reference/argparse_tools.md Shows how to use the get_commandline_args function to retrieve parsed command-line arguments. Assumes specific arguments were passed via the CLI and demonstrates accessing them. ```python from funclip.utils.argparse_tools import get_commandline_args args = get_commandline_args() # Assume CLI was: python script.py --stage 1 --file video.mp4 print(args.stage) # 1 print(args.file) # 'video.mp4' ``` -------------------------------- ### Example Usage of srt() Source: https://github.com/modelscope/funclip/blob/main/_autodocs/api-reference/subtitle_utils.md Demonstrates initializing Text2SRT and calling its srt() method to generate an SRT formatted string. ```python t2s = Text2SRT(['我', '们'], [[500, 700]], offset=0) print(t2s.srt(acc_ost=0.0)) ``` -------------------------------- ### Timestamp List Format Example Source: https://github.com/modelscope/funclip/blob/main/_autodocs/api-reference/command_line.md Example of the timestamp output format, represented as a Python list of [start, end] time pairs. ```python [[0, 100], [100, 200], [200, 300], ...] ``` -------------------------------- ### Example Usage of time() Source: https://github.com/modelscope/funclip/blob/main/_autodocs/api-reference/subtitle_utils.md Demonstrates initializing Text2SRT and calling its time() method to retrieve the time range with an applied offset. ```python t2s = Text2SRT(['我', '们'], [[500, 700]], offset=0) start, end = t2s.time(acc_ost=1.5) # Returns: (0.5 + 1.5, 0.7 + 1.5) = (2.0, 2.2) ``` -------------------------------- ### Argparse Setup for Gradio Launch Script Source: https://github.com/modelscope/funclip/blob/main/_autodocs/api-reference/argparse_tools.md Sets up the argument parser for the Gradio launch script, defining arguments for language, ASR model, sharing, port, and listening options. ```python parser = argparse.ArgumentParser(description='argparse testing') parser.add_argument('--lang', '-l', type=str, default="zh", help="language") parser.add_argument('--model', '-m', type=str, default="paraformer", choices=["paraformer", "fun-asr-nano", "sensevoice"], help="ASR model") parser.add_argument('--share', '-s', action='store_true', help="if to establish gradio share link") parser.add_argument('--port', '-p', type=int, default=7860, help='port number') parser.add_argument('--listen', action='store_true', help="if to listen to all hosts") ``` -------------------------------- ### Initialize VideoClipper ArgumentParser Source: https://github.com/modelscope/funclip/blob/main/_autodocs/api-reference/argparse_tools.md Initializes the ArgumentParser for VideoClipper, setting up the description and formatter class. This is part of the complete argument setup for the VideoClipper. ```python def get_parser(): parser = ArgumentParser( description="ClipVideo Argument", formatter_class=argparse.ArgumentDefaultsHelpFormatter, ) ``` -------------------------------- ### Sentences List of Dicts Format Example Source: https://github.com/modelscope/funclip/blob/main/_autodocs/api-reference/command_line.md Example of the sentences output format, where each sentence is a dictionary containing text, timestamp, and speaker information. ```python [{'text': [...], 'timestamp': [...], 'spk': 0}, ...] ``` -------------------------------- ### Raw Text Output Example Source: https://github.com/modelscope/funclip/blob/main/_autodocs/api-reference/command_line.md Example of the plain text output format for recognition results. ```text 我们 的 设 计 能 力 。 ``` -------------------------------- ### Sentence Info Structure Example Source: https://github.com/modelscope/funclip/blob/main/_autodocs/types.md Provides an example of the sentence information dictionary, showing recognized Chinese text tokens, their corresponding timestamps, and speaker ID. ```python sentence = { 'text': ['我', '爱', '自', '然', '语', '言', '处', '理'], 'timestamp': [ [100, 200], [200, 300], [300, 400], [400, 500], [500, 600], [600, 700], [700, 800], [800, 900] ], 'spk': 0 } ``` -------------------------------- ### Install imagemagick on macOS Source: https://github.com/modelscope/funclip/blob/main/README.md Installs imagemagick on macOS using Homebrew and configures its policy file. ```shell brew install imagemagick sed -i 's/none/read,write/g' /usr/local/Cellar/imagemagick/7.1.1-8_1/etc/ImageMagick-7/policy.xml ``` -------------------------------- ### Stage 1: Recognition with Speaker Diarization CLI Example Source: https://github.com/modelscope/funclip/blob/main/_autodocs/api-reference/command_line.md Example of using the command-line interface for stage 1 recognition with speaker diarization enabled. This allows for speaker identification during the recognition process. ```bash python funclip/videoclipper.py --stage 1 \ --file examples/video.mp4 \ --output_dir ./output \ --sd_switch yes \ --lang zh ``` -------------------------------- ### State File Content Example Source: https://github.com/modelscope/funclip/blob/main/_autodocs/types.md State files created by write_state() store Python objects using repr() format. This example shows the content of a timestamp file. ```text [[0, 100], [100, 200], [200, 300]] ``` -------------------------------- ### Stage 2: Clip with Text Search CLI Example Source: https://github.com/modelscope/funclip/blob/main/_autodocs/api-reference/command_line.md Example of using the command-line interface to perform stage 2 clipping based on a specific text search. This requires a previously completed stage 1 recognition. ```bash python funclip/videoclipper.py --stage 2 \ --file examples/video.mp4 \ --output_dir ./output \ --dest_text "待裁剪的文本" \ --start_ost 0 \ --end_ost 100 \ --output_file ./output/clipped.mp4 ``` -------------------------------- ### State Dictionary Example Source: https://github.com/modelscope/funclip/blob/main/_autodocs/types.md An example illustrating the structure and typical content of the state dictionary, including sample data for recognition results, timestamps, sentences, and optional video metadata. ```python state = { 'audio_input': (16000, np.array([...])), 'recog_res_raw': '我 爱 自 然 语 言 处 理', 'timestamp': [[0, 100], [100, 200], ...], 'sentences': [ { 'text': ['我', '爱'], 'timestamp': [[0, 100], [100, 200]], 'spk': 0 }, ... ], 'sd_sentences': [...], 'video_filename': 'video.mp4', 'clip_video_file': 'video_clip.mp4', 'video': VideoFileClip(...) } ``` -------------------------------- ### Valid Offset Syntax Examples Source: https://github.com/modelscope/funclip/blob/main/_autodocs/errors.md Illustrates the correct syntax for specifying offsets within the destination text for Funclip's `clip` method. ```python "我们的设计能力[100, 200]" # Valid: offsets applied "我们的设计能力[abc, def]" # Invalid: non-numeric offsets "我们的设计能力[100]" # Invalid: missing second offset ``` -------------------------------- ### Stage 2: Clip with Speaker Filter CLI Example Source: https://github.com/modelscope/funclip/blob/main/_autodocs/api-reference/command_line.md Example of using the command-line interface to perform stage 2 clipping based on a specific speaker ID. This requires a previously completed stage 1 recognition with speaker diarization enabled. ```bash python funclip/videoclipper.py --stage 2 \ --file examples/video.mp4 \ --output_dir ./output \ --dest_spk "spk0" \ --output_file ./output/spk0.mp4 ``` -------------------------------- ### Launch Local FunClip Gradio Service Source: https://github.com/modelscope/funclip/blob/main/README.md Starts a local Gradio service for FunClip. Use '-m' to specify the ASR model, '-l' for language, '-p' for port, and '-s' for public access. ```shell python funclip/launch.py # '-m fun-asr-nano' for Fun-ASR-Nano model (higher accuracy, 31 languages) # '-m sensevoice' for SenseVoice model (multilingual ASR + emotion + audio event detection) # '-l en' for English audio recognize # '-p xxx' for setting port number # '-s True' for establishing service for public accessing ``` -------------------------------- ### SRT File Format Example Source: https://github.com/modelscope/funclip/blob/main/_autodocs/configuration.md Shows the format of the total.srt file, including sequence numbers, timestamps, and recognized text. It can also include speaker information. ```plaintext 1 00:00:00,000 --> 00:00:02,500 识别文本 2 spk0 00:00:02,500 --> 00:00:05,000 带说话人信息的文本 ``` -------------------------------- ### Example Usage of Custom ArgumentParser Source: https://github.com/modelscope/funclip/blob/main/_autodocs/api-reference/argparse_tools.md Demonstrates how to use the custom ArgumentParser to define and parse command-line arguments for a video clipping task. It shows adding required arguments and accessing parsed values. ```python from funclip.utils.argparse_tools import ArgumentParser parser = ArgumentParser( description="ClipVideo Argument", formatter_class=argparse.ArgumentDefaultsHelpFormatter, ) parser.add_argument( "--stage", type=int, choices=(1, 2), help="Stage, 0 for recognizing and 1 for clipping", required=True ) args = parser.parse_args() print(f"Stage: {args.stage}") ``` -------------------------------- ### Argument Parser Setup Source: https://github.com/modelscope/funclip/blob/main/_autodocs/DOCUMENTATION_SUMMARY.txt The get_parser() function returns the argument parser object used by the CLI. This can be useful for understanding or extending the available command-line arguments. ```python from funclip.command_line import get_parser parser = get_parser() args = parser.parse_args() ``` -------------------------------- ### Launch FunClip with Fun-ASR-Nano or SenseVoice Models Source: https://github.com/modelscope/funclip/blob/main/README.md Run this command to try FunClip with the Fun-ASR-Nano model for higher accuracy across 31 languages, or with the SenseVoice model for emotion recognition and audio event detection. ```bash python funclip/launch.py -m fun-asr-nano ``` ```bash python funclip/launch.py -m sensevoice ``` -------------------------------- ### Launch FunCLIP with Arguments Source: https://github.com/modelscope/funclip/blob/main/_autodocs/INDEX.md Use these arguments to launch the FunCLIP application, specifying language, model, port, and sharing options. ```bash python funclip/launch.py \ -l zh # Language: zh, en -m paraformer # Model: paraformer, fun-asr-nano, sensevoice -p 7860 # Port number -s # Enable share link --listen # Listen 0.0.0.0 ``` -------------------------------- ### SRT Subtitle Format Example Source: https://github.com/modelscope/funclip/blob/main/_autodocs/api-reference/command_line.md Example of the SRT subtitle format, including sequence numbers and timestamps. ```text 1 00:00:00,000 --> 00:00:02,500 识别结果的第一句 2 spk0 00:00:02,500 --> 00:00:05,000 第二句(带说话人信息) ``` -------------------------------- ### Launch FunClip with Emotion and Event Detection Model Source: https://github.com/modelscope/funclip/blob/main/_autodocs/configuration.md Launches the Gradio web interface using the 'sensevoice' model, which supports emotion and audio event detection, via the '-m' argument. ```bash python funclip/launch.py -m sensevoice ``` -------------------------------- ### Listen on All Interfaces Launch Source: https://github.com/modelscope/funclip/blob/main/_autodocs/api-reference/command_line.md Launches the Gradio interface to listen on all network interfaces (0.0.0.0) using the --listen flag. ```bash # All network interfaces python funclip/launch.py --listen ``` -------------------------------- ### SRT File Format Example Source: https://github.com/modelscope/funclip/blob/main/_autodocs/types.md Files with .srt extension use the standard SubRip format for subtitle timing and text. This example shows a single subtitle entry. ```text 1 00:00:00,000 --> 00:00:02,500 识别结果的第一句 ``` -------------------------------- ### Initialize and Recognize Audio with FunClip Source: https://github.com/modelscope/funclip/blob/main/_autodocs/README.md This pattern initializes the AutoModel and VideoClipper, setting the language to Chinese for recognition. It requires the `funclip` and `funasr` libraries. Ensure `audio_input` is defined. ```python from funclip.videoclipper import VideoClipper from funasr import AutoModel model = AutoModel(model="iic/speech_seaco_paraformer_large_asr_nat-zh-cn-16k-common-vocab8404-pytorch") clipper = VideoClipper(model) clipper.lang = 'zh' res_text, res_srt, state = clipper.recog(audio_input) ``` -------------------------------- ### Run Stage 1 (Recognition) with Videoclipper Source: https://github.com/modelscope/funclip/blob/main/_autodocs/INDEX.md Configure and execute the first stage of the videoclipper script, which handles audio recognition. Ensure to specify input file, output directory, and language. ```bash python funclip/videoclipper.py \ --stage 1 \ --file input.mp4 \ --output_dir ./output \ --sd_switch yes # Enable speaker diarization --lang zh # Language: zh, en ``` -------------------------------- ### Launch with Fun-ASR-Nano Model and Chinese Language Source: https://github.com/modelscope/funclip/blob/main/_autodocs/api-reference/argparse_tools.md Launches the Gradio interface using the Fun-ASR-Nano model and Chinese language settings. ```bash python funclip/launch.py -m fun-asr-nano -l zh ``` -------------------------------- ### Launch FunClip for Network Access Source: https://github.com/modelscope/funclip/blob/main/_autodocs/configuration.md Launches the Gradio web interface to listen on all network interfaces (0.0.0.0) instead of just localhost, enabling remote access via the '--listen' argument. ```bash python funclip/launch.py --listen ``` -------------------------------- ### Launch with Public Share Link Source: https://github.com/modelscope/funclip/blob/main/_autodocs/api-reference/argparse_tools.md Launches the Gradio interface and creates a temporary public share link, valid for 72 hours. ```bash python funclip/launch.py --share # Creates temporary public URL for 72 hours ``` -------------------------------- ### Launch FunClip for English Audio File Recognition Source: https://github.com/modelscope/funclip/blob/main/README.md Use this command to enable FunClip's ability to recognize and clip English audio files. ```bash python funclip/launch.py -l en ``` -------------------------------- ### Launch Gradio Interface Source: https://github.com/modelscope/funclip/blob/main/_autodocs/api-reference/command_line.md Use this command to launch the Gradio web interface. Options can be appended to customize the launch. ```bash python funclip/launch.py [options] ``` -------------------------------- ### Launch Web Interface on Custom Port Source: https://github.com/modelscope/funclip/blob/main/_autodocs/api-reference/argparse_tools.md Launches the Gradio web interface on a specified custom port. ```bash python funclip/launch.py --port 8080 ``` -------------------------------- ### CLI Entry Point Source: https://github.com/modelscope/funclip/blob/main/_autodocs/INDEX.md The main entry point for the FunCLIP command-line interface. ```APIDOC ## CLI Entry Point ### `runner(stage, file, ...)` #### Description Main CLI entry point that orchestrates the execution based on the specified stage and file. #### Parameters - `stage`: The execution stage (e.g., 'transcribe', 'clip'). - `file` (str): The input file path. - `...`: Additional command-line arguments. #### Returns - `None` ### `get_parser()` #### Description Create and return the argument parser for the CLI. #### Returns - `ArgumentParser`: An instance of `argparse.ArgumentParser`. ### `get_commandline_args()` #### Description Parse the command-line arguments provided by the user. #### Returns - `Namespace`: An object containing the parsed command-line arguments. ``` -------------------------------- ### Get Argument Parser Source: https://github.com/modelscope/funclip/blob/main/_autodocs/api-reference/command_line.md Retrieves the ArgumentParser object used for command-line interface arguments. This is essential for understanding and utilizing the CLI options. ```python def get_parser(): pass ``` -------------------------------- ### Get Time Range as Seconds Source: https://github.com/modelscope/funclip/blob/main/_autodocs/api-reference/subtitle_utils.md Returns the time range as a tuple of (start_seconds, end_seconds), with an optional time accumulation offset applied. ```python def time(self, acc_ost=0.0): pass ``` -------------------------------- ### Multiple Text Segments with Offsets Source: https://github.com/modelscope/funclip/blob/main/_autodocs/api-reference/argparse_tools.md Clips multiple text segments from a video, allowing specification of start and end offsets for each segment. ```bash python funclip/videoclipper.py \ --stage 2 \ --file video.mp4 \ --output_dir ./asr_results \ --dest_text "第一段#第二段#第三段" \ --start_ost 100 \ --end_ost -50 \ --output_file ./multi_clips.mp4 ``` -------------------------------- ### Timestamp List Structure Source: https://github.com/modelscope/funclip/blob/main/_autodocs/types.md Represents time ranges for clipping video segments. Each element is a list containing the start and end time in milliseconds. ```python [[start_ms, end_ms], [start_ms, end_ms], ...] ``` ```python # Direct timestamps timestamp_list = [[500, 5850], [7120, 12940], [13240, 25620]] ``` ```python # From text matching from funclip.utils.trans_utils import proc, pre_proc ts = proc(recog_res_raw, timestamp, pre_proc("待裁剪文本")) # Returns frame-unit timestamps, convert to ms: [[start/16, end/16], ...] ``` -------------------------------- ### Full Pipeline: Recognition and Clipping in Python Source: https://github.com/modelscope/funclip/blob/main/_autodocs/module-overview.md Initializes the AutoModel and VideoClipper, performs speech recognition on an audio file, and then clips the recognized content based on specified text. ```python from funasr import AutoModel from funclip.videoclipper import VideoClipper import librosa # Initialize model = AutoModel(model="iic/speech_seaco_paraformer_large_asr_nat-zh-cn-16k-common-vocab8404-pytorch") clipper = VideoClipper(model) clipper.lang = 'zh' # Step 1: Recognize wav, sr = librosa.load('video.mp4', sr=16000) res_text, res_srt, state = clipper.recog((sr, wav)) # Step 2: Clip (sr, clipped), msg, srt = clipper.clip( dest_text="待裁剪文本", start_ost=0, end_ost=100, state=state ) ``` -------------------------------- ### Launch FunClip with High-Accuracy Multilingual Model Source: https://github.com/modelscope/funclip/blob/main/_autodocs/configuration.md Launches the Gradio web interface using the high-accuracy multilingual model 'fun-asr-nano' via the '-m' argument. ```bash python funclip/launch.py -m fun-asr-nano ``` -------------------------------- ### Launch Listening on All Network Interfaces Source: https://github.com/modelscope/funclip/blob/main/_autodocs/api-reference/argparse_tools.md Launches the Gradio interface to listen on all network interfaces (0.0.0.0), allowing access from other machines on the network. ```bash python funclip/launch.py --listen # Allows access from other machines on the network ``` -------------------------------- ### Alibaba Qwen Configuration Source: https://github.com/modelscope/funclip/blob/main/_autodocs/configuration.md Specifies the service (DashScope/百炼) and supported models for Alibaba Qwen. Configuration requires setting the DashScope API key. ```python Service: DashScope (百炼) Models Supported: - qwen_plus - qwen_max - qwen_turbo API Key: Bailian platform API key Configuration: Set via `dashscope.api_key = key` ``` -------------------------------- ### Get Command Line Arguments Source: https://github.com/modelscope/funclip/blob/main/_autodocs/DOCUMENTATION_SUMMARY.txt The get_commandline_args() function is used to retrieve and parse command-line arguments for FunClip. Ensure all required arguments are provided when running the CLI. ```python from funclip.argparse_tools import get_commandline_args args = get_commandline_args() print(args.input_path) ``` -------------------------------- ### Command Line Interface (CLI) Source: https://github.com/modelscope/funclip/blob/main/_autodocs/DOCUMENTATION_SUMMARY.txt Documentation for the command-line interface entry points and argument parsing. ```APIDOC ## Command Line Interface (CLI) ### Description Entry points and utilities for interacting with FunClip via the command line. ### Functions #### `runner()` - **Description**: The main entry point for the CLI. - **Parameters**: (Details not provided in source) - **Returns**: (Details not provided in source) #### `get_parser()` - **Description**: Retrieves the argument parser configuration. - **Parameters**: (Details not provided in source) - **Returns**: (Details not provided in source) ### Stages #### Stage 1 (Recognition) - **Description**: Documentation for the recognition stage via CLI. #### Stage 2 (Clipping) - **Description**: Documentation for the clipping stage via CLI. ### Gradio Interface - **Description**: Information on launching the Gradio interface. ### Usage Examples - **Description**: Complete CLI usage examples are provided. ``` -------------------------------- ### Main CLI Entry Point Source: https://github.com/modelscope/funclip/blob/main/_autodocs/DOCUMENTATION_SUMMARY.txt The runner() function serves as the main entry point for the FunClip command-line interface (CLI). Use this to execute FunClip from the terminal. ```python from funclip.command_line import runner if __name__ == "__main__": runner() ``` -------------------------------- ### Clip Video Without Subtitles Source: https://github.com/modelscope/funclip/blob/main/_autodocs/api-reference/VideoClipper.md Use this example to clip a video segment based on specific text content without adding any subtitle overlay. Ensure the 'state' dictionary is correctly populated from video recognition. ```python # Clip video containing specific text clip_video_file, message, srt = clipper.video_clip( dest_text="待裁剪的文本", start_ost=100, end_ost=-50, state=state, add_sub=False ) ``` -------------------------------- ### LLM-Based Clipping with OpenAI API Source: https://github.com/modelscope/funclip/blob/main/_autodocs/module-overview.md Integrates with an LLM (like OpenAI's GPT) to get clipping suggestions based on subtitles. Parses the LLM response to extract timestamps and then uses these timestamps for precise clipping. ```python # After recognition (Pattern 1, Step 1) # Get LLM suggestions from funclip.llm.openai_api import openai_call from funclip.utils.trans_utils import extract_timestamps llm_response = openai_call( apikey="sk-...", model="gpt-3.5-turbo", system_content="你是视频剪辑助手...", user_content="这是待裁剪的视频SRT字幕:\n" + res_srt ) # Parse and clip timestamps = extract_timestamps(llm_response) (sr, clipped), msg, srt = clipper.clip( dest_text="", start_ost=0, end_ost=0, state=state, timestamp_list=timestamps ) ``` -------------------------------- ### VideoClipper Constructor Source: https://github.com/modelscope/funclip/blob/main/_autodocs/api-reference/VideoClipper.md Initializes a VideoClipper instance with an optional FunASR model for speech recognition. ```APIDOC ## `__init__(funasr_model)` ### Description Initializes a VideoClipper instance with a FunASR model. ### Parameters #### Path Parameters - **funasr_model** (AutoModel | None) - Required - FunASR AutoModel instance for speech recognition. Can be None if only performing clipping operations with pre-computed state. ### Request Example ```python from funasr import AutoModel from funclip.videoclipper import VideoClipper # Initialize with Chinese ASR model funasr_model = AutoModel( model="iic/speech_seaco_paraformer_large_asr_nat-zh-cn-16k-common-vocab8404-pytorch", vad_model="damo/speech_fsmn_vad_zh-cn_16k-common-pytorch", punc_model="damo/punc_ct-transformer_zh-cn-common-vocab272727-pytorch", spk_model="damo/speech_campplus_sv_zh-cn_16k-common", ) clipper = VideoClipper(funasr_model) clipper.lang = 'zh' ``` ``` -------------------------------- ### Launch FunClip on Custom Port with Public Share Source: https://github.com/modelscope/funclip/blob/main/_autodocs/configuration.md Launches the Gradio web interface on a custom port (8080) and enables a public share link using '-p' and '-s' arguments. ```bash python funclip/launch.py -p 8080 -s ``` -------------------------------- ### Sentence Info Structure Definition Source: https://github.com/modelscope/funclip/blob/main/_autodocs/types.md Defines the structure for recognized text, including text content, token-level timestamps, and optional speaker information. Text can be a string or a list of tokens, and timestamps are provided as millisecond start and end times. ```python { 'text': str | list[str], 'timestamp': list[list[int, int]], 'spk': int | str # Optional } ``` -------------------------------- ### generate_srt_clip(sentence_list, start, end, begin_index=0, time_acc_ost=0.0) Source: https://github.com/modelscope/funclip/blob/main/_autodocs/api-reference/subtitle_utils.md Generates SRT subtitles for a specific time range within a larger transcript. This function intelligently handles partial sentences at the boundaries and allows for sequential generation of subtitle clips. ```APIDOC ## generate_srt_clip(sentence_list, start, end, begin_index=0, time_acc_ost=0.0) ### Description Generates SRT subtitles for a specific time range within the full transcript. ### Parameters #### Path Parameters - **sentence_list** (list[dict]) - Required - List of sentence dictionaries with 'text' and 'timestamp' keys. - **start** (float) - Required - Start time in seconds. - **end** (float) - Required - End time in seconds. - **begin_index** (int) - Optional - Default: 0 - Starting subtitle index number (for multiple clips, use previous index-1). - **time_acc_ost** (float) - Optional - Default: 0.0 - Time accumulation offset in seconds for multi-segment concatenation. ### Return Type `tuple (str, list[tuple], int)` Returns a tuple of: - SRT subtitle string for the clipped range - List of subtitle tuples: [((start_sec, end_sec), text), ...] - Next index number (for consecutive clips) ### Details Handles partial sentences that overlap with the clip boundaries. Splits tokens and timestamps appropriately. ### Example ```python from funclip.utils.subtitle_utils import generate_srt_clip sentences = [ { 'text': ['我', '们', '的', '设', '计', '能', '力'], 'timestamp': [[100, 200], [200, 300], [300, 400], [400, 500], [500, 600], [600, 700], [700, 800]] } ] # Get subtitles for time range 0.2s to 0.5s srt, subs, next_idx = generate_srt_clip( sentences, start=0.2, end=0.5, begin_index=0, time_acc_ost=0.0 ) print(srt) # Subtitle output with adjusted timestamps print(subs) # [((0.2, 0.5), '们的设计'), ...] print(next_idx) # 1 (next subtitle index) ``` ``` -------------------------------- ### Generate VAD Segments from Speaker Diarization Source: https://github.com/modelscope/funclip/blob/main/_autodocs/api-reference/trans_utils.md Generates Voice Activity Detection (VAD) segments, including start and end times in seconds and the corresponding audio data, from speaker diarization results. Requires audio data and speaker diarization information. ```python from funclip.utils.trans_utils import generate_vad_data # Assuming 'data' is an ndarray of audio data and 'sd_sentences' is a list of dicts with 'ts_list' generate_vad_data(data, sd_sentences, sr=16000) ``` -------------------------------- ### Public Share Link Launch Source: https://github.com/modelscope/funclip/blob/main/_autodocs/api-reference/command_line.md Launches the Gradio interface and creates a temporary public shareable link using the -s flag. ```bash # Public share link (temporary) python funclip/launch.py -s ``` -------------------------------- ### Generate SRT subtitles for a time clip Source: https://github.com/modelscope/funclip/blob/main/_autodocs/api-reference/subtitle_utils.md Generates SRT subtitles for a specific time range (start to end in seconds) from a list of sentences. It handles partial sentences at boundaries and returns the SRT string, a list of subtitle tuples, and the next index number. ```python from funclip.utils.subtitle_utils import generate_srt_clip sentences = [ { 'text': ['我', '们', '的', '设', '计', '能', '力'], 'timestamp': [[100, 200], [200, 300], [300, 400], [400, 500], [500, 600], [600, 700], [700, 800]] } ] # Get subtitles for time range 0.2s to 0.5s srt, subs, next_idx = generate_srt_clip( sentences, start=0.2, end=0.5, begin_index=0, time_acc_ost=0.0 ) print(srt) # Subtitle output with adjusted timestamps print(subs) # [((0.2, 0.5), '们的设计'), ...] print(next_idx) # 1 (next subtitle index) ``` -------------------------------- ### Run Stage 2 (Clipping) with Videoclipper Source: https://github.com/modelscope/funclip/blob/main/_autodocs/INDEX.md Configure and execute the second stage of the videoclipper script for clipping video content based on text. Specify input file, output directory, target text, and timing offsets. ```bash python funclip/videoclipper.py \ --stage 2 \ --file input.mp4 \ --output_dir ./output \ --dest_text "text#to#find" \ --start_ost 0 # ms offset --end_ost 100 # ms offset --output_file output.mp4 ``` -------------------------------- ### Initialize VideoClipper with FunASR Model Source: https://github.com/modelscope/funclip/blob/main/_autodocs/api-reference/VideoClipper.md Initializes the VideoClipper with a FunASR AutoModel instance. Set the language attribute ('zh' or 'en') after initialization. This is required for speech recognition functionalities. ```python from funasr import AutoModel from funclip.videoclipper import VideoClipper # Initialize with Chinese ASR model funasr_model = AutoModel( model="iic/speech_seaco_paraformer_large_asr_nat-zh-cn-16k-common-vocab8404-pytorch", vad_model="damo/speech_fsmn_vad_zh-cn_16k-common-pytorch", punc_model="damo/punc_ct-transformer_zh-cn-common-vocab272727-pytorch", spk_model="damo/speech_campplus_sv_zh-cn_16k-common", ) clipper = VideoClipper(funasr_model) clipper.lang = 'zh' ``` -------------------------------- ### SenseVoice Model Launch Source: https://github.com/modelscope/funclip/blob/main/_autodocs/api-reference/command_line.md Launches the Gradio interface using the SenseVoice model, which supports emotion and event detection, specified by the -m flag. ```bash # SenseVoice (emotion + event detection) python funclip/launch.py -m sensevoice ``` -------------------------------- ### Handle Output Directory Source: https://github.com/modelscope/funclip/blob/main/_autodocs/api-reference/argparse_tools.md Ensures the output directory is correctly formatted by stripping trailing slashes and creates the directory if it does not exist. ```python while output_dir.endswith('/'): output_dir = output_dir[:-1] if not os.path.exists(output_dir): os.mkdir(output_dir) ``` -------------------------------- ### FunClip Runner Function Signature Source: https://github.com/modelscope/funclip/blob/main/_autodocs/api-reference/command_line.md Defines the main entry point for command-line video/audio clipping operations. It accepts parameters for processing stage, input file, output directory, clipping criteria, and language. ```python def runner(stage, file, sd_switch, output_dir, dest_text, dest_spk, start_ost, end_ost, output_file, config=None, lang='zh'): pass ``` -------------------------------- ### Recognition Stage CLI Parameters Source: https://github.com/modelscope/funclip/blob/main/_autodocs/configuration.md Use these parameters for the recognition stage (Stage 1) of the videoclipper.py runner. Specify the input file and output directory. ```bash --stage 1 --file input.mp4 --output_dir ./asr_results --sd_switch yes --lang en ``` -------------------------------- ### Full Pipeline for Audio Clipping Source: https://github.com/modelscope/funclip/blob/main/_autodocs/INDEX.md Use this pattern for a complete audio recognition and clipping process. It requires importing AutoModel and VideoClipper, and setting the language for recognition. ```python from funasr import AutoModel from funclip.videoclipper import VideoClipper model = AutoModel(model="iic/speech_seaco_paraformer_large_asr_nat-zh-cn-16k-common-vocab8404-pytorch") clipper = VideoClipper(model) clipper.lang = 'zh' # Recognize res_text, res_srt, state = clipper.recog(audio_input) # Clip (sr, audio), msg, srt = clipper.clip( dest_text="text", start_ost=0, end_ost=100, state=state ) ``` -------------------------------- ### Fun-ASR-Nano Model Launch Source: https://github.com/modelscope/funclip/blob/main/_autodocs/api-reference/command_line.md Launches the Gradio interface using the Fun-ASR-Nano model, known for supporting 31 languages and higher accuracy, specified by the -m flag. ```bash # Fun-ASR-Nano (31 languages, higher accuracy) python funclip/launch.py -m fun-asr-nano ``` -------------------------------- ### Minimal Recognition (Defaults) Source: https://github.com/modelscope/funclip/blob/main/_autodocs/api-reference/argparse_tools.md Performs minimal recognition using default settings for output directory, language, and SD switch. ```bash python funclip/videoclipper.py --stage 1 --file video.mp4 # Uses defaults: output_dir="./output", lang="zh", sd_switch="no" ``` -------------------------------- ### g4f (Free) Configuration Source: https://github.com/modelscope/funclip/blob/main/_autodocs/configuration.md Details the configuration for using g4f (free service). No API key is required, but the service is noted as unstable. ```plaintext Service: gpt4free Models: Any model available in g4f (gpt-3.5-turbo, gpt-4, etc.) API Key: None required Warning: Unstable, may timeout or fail. Retry recommended. ```