### Install videopython Source: https://videopython.com/ Install the core videopython library or the full package with all AI features. GPU is recommended for AI features. ```bash pip install videopython # core editing ``` ```bash pip install "videopython[ai]" # + ALL local AI features (GPU recommended) ``` -------------------------------- ### Basic VideoPython Installation Source: https://videopython.com/getting-started/installation Installs the core VideoPython package for basic video handling and processing using pip or uv. ```bash pip install videopython ``` ```bash # Or with uv uv add videopython ``` -------------------------------- ### Install VideoPython with All AI Features Source: https://videopython.com/getting-started/installation Installs VideoPython with all AI-powered features, including generation, understanding, and dubbing, using pip or uv. ```bash pip install "videopython[ai]" ``` ```bash # Or with uv uv add videopython --extra ai ``` -------------------------------- ### Install VideoDubber with Dubbing Extras Source: https://videopython.com/api/ai/dubbing Install the 'dub' extra for the core dubbing pipeline. Include 'tts' for default local speech synthesis. ```bash pip install "videopython[dub]" # pipeline WITHOUT local TTS pip install "videopython[dub,tts]" # + default local voice synthesis ``` -------------------------------- ### Install FFmpeg on Ubuntu/Debian Source: https://videopython.com/getting-started/installation Installs FFmpeg using apt-get on Ubuntu/Debian systems. FFmpeg is a required prerequisite for VideoPython. ```bash # Ubuntu/Debian sudo apt-get install ffmpeg ``` -------------------------------- ### Install VideoPython with Specific AI Extras Source: https://videopython.com/getting-started/installation Installs VideoPython with specific AI capabilities like ASR, vision, separation, translation, TTS, or generation. Use granular extras for smaller, conflict-free installations. ```bash pip install "videopython[asr]" # just transcription ``` ```bash pip install "videopython[dub,tts]" # dubbing with local TTS ``` -------------------------------- ### Install FFmpeg on Windows Source: https://videopython.com/getting-started/installation Installs FFmpeg on Windows using Chocolatey. FFmpeg is a required prerequisite for VideoPython. ```bash # Windows (with Chocolatey) choco install ffmpeg ``` -------------------------------- ### Audio Class Usage Examples Source: https://videopython.com/api/core/audio Demonstrates common operations with the Audio class, such as loading, creating, manipulating, and saving audio files. ```python from videopython.audio import Audio # Load from file audio = Audio.from_path("music.mp3") # Create silent track silent = Audio.create_silent(duration_seconds=5.0, stereo=True) # Basic operations mono = audio.to_mono() resampled = audio.resample(16000) segment = audio.slice(start_seconds=1.0, end_seconds=5.0) # Combine audio combined = audio1.concat(audio2, crossfade=0.5) mixed = audio1.overlay(audio2, position=2.0) # Save audio.save("output.wav") ``` -------------------------------- ### Initialize Pyannote Speaker Diarization Pipeline Source: https://videopython.com/api/ai/understanding Initializes the pyannote speaker diarization pipeline. Requires 'pyannote.audio' to be installed. ```python def _init_diarization(self) -> None: """Initialize pyannote speaker diarization pipeline.""" import torch from videopython.ai._optional import require Pipeline = require("pyannote.audio", "asr", feature="AudioToText diarization").Pipeline self._diarization_pipeline = Pipeline.from_pretrained( self.PYANNOTE_DIARIZATION_MODEL, revision=pinned(self.PYANNOTE_DIARIZATION_MODEL) ) self._diarization_pipeline.to(torch.device(self.device)) ``` -------------------------------- ### Initialize and Use TextToMusic for Audio Generation Source: https://videopython.com/api/ai/generation Initializes the MusicGen model locally and generates audio from a text description. Ensure 'transformers' is installed for this feature. ```python class TextToMusic(ManagedPredictor): """Generates music from text descriptions using MusicGen.""" def __init__(self, device: str | None = None): self.device = device self._processor: Any = None self._model: Any = None def _init_local(self) -> None: """Initialize local MusicGen model.""" import os from videopython.ai._optional import require _transformers = require("transformers", "generation", feature="TextToMusic") AutoProcessor = _transformers.AutoProcessor MusicgenForConditionalGeneration = _transformers.MusicgenForConditionalGeneration os.environ["PYTORCH_ENABLE_MPS_FALLBACK"] = "1" requested_device = self.device device = select_device(self.device, mps_allowed=True) model_name = "facebook/musicgen-small" self._processor = AutoProcessor.from_pretrained(model_name, revision=pinned(model_name)) self._model = MusicgenForConditionalGeneration.from_pretrained(model_name, revision=pinned(model_name)) self._model.to(device) self.device = device log_device_initialization( "TextToMusic", requested_device=requested_device, resolved_device=device, ) def generate_audio(self, text: str, max_new_tokens: int = 256) -> Audio: """Generate music audio from text description.""" if self._model is None: self._init_local() inputs = self._processor(text=[text], padding=True, return_tensors="pt") inputs = {k: v.to(self.device) if hasattr(v, "to") else v for k, v in inputs.items()} audio_values = self._model.generate(**inputs, max_new_tokens=max_new_tokens) sampling_rate = self._model.config.audio_encoder.sampling_rate audio_data = audio_values[0, 0].cpu().float().numpy() metadata = AudioMetadata( sample_rate=sampling_rate, channels=1, sample_width=2, duration_seconds=len(audio_data) / sampling_rate, frame_count=len(audio_data), ) return Audio(audio_data, metadata) def unload(self) -> None: """Release the MusicGen model so the next generate_audio() re-initializes.""" self._model = None self._processor = None release_device_memory(self.device) ``` -------------------------------- ### Iterate Through Video Frames Source: https://videopython.com/api/core/video Use the `__iter__` method to get a generator that yields frame index and frame data. Frame indices are absolute and account for any start second offset. ```python def __iter__(self) -> Generator[tuple[int, np.ndarray], None, None]: """Yield (frame_index, frame) tuples. Frame indices are absolute indices in the original video, accounting for any start_second offset. """ self._iter = self._iter_frames() return self._iter ``` -------------------------------- ### Create and Execute a Video Editing Plan Source: https://videopython.com/api/editing Demonstrates how to define a video editing plan using a dictionary, convert it to a VideoEdit object, perform a dry-run validation, and then execute the plan to save the output file. This is the primary method for creating and running edits. ```python from videopython.editing import VideoEdit plan = { "segments": [ { "source": "input.mp4", "start": 5.0, "end": 12.0, "operations": [ {"op": "crop", "width": 0.5, "height": 1.0, "mode": "center"}, {"op": "resize", "width": 1080, "height": 1920}, { "op": "blur_effect", "mode": "constant", "iterations": 1, "window": {"start": 0.0, "stop": 1.0}, }, ], }, {"source": "input.mp4", "start": 20.0, "end": 28.0}, ], "post_operations": [ {"op": "color_adjust", "brightness": 0.05}, ], } edit = VideoEdit.from_dict(plan) predicted = edit.validate() # dry-run via VideoMetadata edit.run_to_file("output.mp4", crf=20, preset="medium") # streams to disk (constant memory, any video length) ``` -------------------------------- ### Calculate Audio Levels for a Segment Source: https://videopython.com/api/core/audio Calculates audio levels (RMS, peak, dB) for a specified time segment. Use this to analyze the loudness of a particular part of the audio. The example shows how to get levels for the entire audio file. ```python def get_levels( self, start_seconds: float = 0.0, end_seconds: float | None = None, ) -> "AudioLevels": """Calculate audio levels for a segment. Args: start_seconds: Start time in seconds (default: 0.0) end_seconds: End time in seconds (default: None, meaning end of audio) Returns: AudioLevels with RMS, peak, and dB measurements Example: >>> audio = Audio.from_path("audio.mp3") >>> levels = audio.get_levels() >>> print(f"Peak: {levels.db_peak:.1f} dB") """ from videopython.audio.analysis import AudioLevels segment = self.slice(start_seconds, end_seconds) data = segment.data.flatten() if segment.metadata.channels == 2 else segment.data rms = float(np.sqrt(np.mean(data**2))) peak = float(np.max(np.abs(data))) # Convert to dB (avoid log of zero) db_rms = 20 * np.log10(max(rms, 1e-10)) db_peak = 20 * np.log10(max(peak, 1e-10)) return AudioLevels(rms=rms, peak=peak, db_rms=float(db_rms), db_peak=float(db_peak)) ``` -------------------------------- ### Video.__init__ Source: https://videopython.com/api/core/video Initializes a Video object with frames, frames per second, and optional audio. ```APIDOC ## Video.__init__ ### Description Initializes a Video object with the provided frames, frames per second (fps), and an optional Audio object. If no audio is provided, a silent audio track is created. ### Method __init__ ### Parameters * **frames** (ndarray) - The video frames. * **fps** (int | float) - The frames per second. * **audio** (Audio | None) - Optional audio object. ``` -------------------------------- ### Implement KenBurns Effect Source: https://videopython.com/api/effects Use the KenBurns effect to create a cinematic pan-and-zoom animation between two crop regions. It's useful for adding motion to still images or guiding the viewer's eye. Ensure start and end regions are within bounds and have valid dimensions. ```python class KenBurns(Effect): """Cinematic pan-and-zoom that smoothly animates between two crop regions. Creates movement by transitioning from a start region to an end region over the clip. Use it to add motion to still images or to guide the viewer's eye across a scene. """ op: Literal["ken_burns"] = "ken_burns" streamable: ClassVar[bool] = True start_region: BoundingBox = Field( description="Starting crop region as a BoundingBox with normalized 0-1 coordinates." ) end_region: BoundingBox = Field(description="Ending crop region as a BoundingBox with normalized 0-1 coordinates.") easing: Literal["linear", "ease_in", "ease_out", "ease_in_out"] = Field( "linear", description=( 'Animation curve. "linear" moves at constant speed, "ease_in" starts slow, ' '"ease_out" ends slow, "ease_in_out" starts and ends slow.' ), ) _stream_regions: np.ndarray | None = PrivateAttr(default=None) _stream_target_w: int = PrivateAttr(default=0) _stream_target_h: int = PrivateAttr(default=0) @model_validator(mode="after") def _validate_regions(self) -> KenBurns: for name, region in [("start_region", self.start_region), ("end_region", self.end_region)]: if not (0 <= region.x <= 1 and 0 <= region.y <= 1): raise ValueError(f"{name} position must be in range [0, 1]!") if not (0 < region.width <= 1 and 0 < region.height <= 1): raise ValueError(f"{name} dimensions must be in range (0, 1]!") if region.x + region.width > 1 or region.y + region.height > 1: raise ValueError(f"{name} extends beyond image bounds!") return self def _crop_and_scale_frame( self, frame: np.ndarray, x: int, y: int, crop_w: int, crop_h: int, target_w: int, target_h: int, ) -> np.ndarray: cropped = frame[y : y + crop_h, x : x + crop_w] return cv2.resize(cropped, (target_w, target_h), interpolation=cv2.INTER_LINEAR) def _precompute_regions(self, n_frames: int, width: int, height: int) -> np.ndarray: sx = int(self.start_region.x * width) sy = int(self.start_region.y * height) sw = int(self.start_region.width * width) sh = int(self.start_region.height * height) ex = int(self.end_region.x * width) ey = int(self.end_region.y * height) ew = int(self.end_region.width * width) eh = int(self.end_region.height * height) regions = np.empty((n_frames, 4), dtype=np.int32) eased = ease(np.arange(n_frames, dtype=np.float64) / max(1, n_frames - 1), self.easing) for i in range(n_frames): et = float(eased[i]) crop_w = int(sw + (ew - sw) * et) crop_h = int(sh + (eh - sh) * et) x = max(0, min(int(sx + (ex - sx) * et), width - crop_w)) y = max(0, min(int(sy + (ey - sy) * et), height - crop_h)) regions[i] = (x, y, crop_w, crop_h) return regions def streaming_init(self, total_frames: int, fps: float, width: int, height: int, **_context: Any) -> None: self._stream_regions = self._precompute_regions(total_frames, width, height) self._stream_target_w = width self._stream_target_h = height def process_frame(self, frame: np.ndarray, frame_index: int) -> np.ndarray: assert self._stream_regions is not None idx = min(frame_index, len(self._stream_regions) - 1) x, y, cw, ch = self._stream_regions[idx] return self._crop_and_scale_frame(frame, x, y, cw, ch, self._stream_target_w, self._stream_target_h) ``` -------------------------------- ### Initialize and Use TextToVideo for Video Generation Source: https://videopython.com/api/ai/generation Initializes the TextToVideo pipeline and generates a video from a text prompt. The pipeline is automatically initialized on the first call to generate_video if not already loaded. Ensure necessary libraries like 'diffusers' are installed. ```python class TextToVideo(ManagedPredictor): """Generates videos from text descriptions using local diffusion models.""" def __init__(self, device: str | None = None): self.device = device self._pipeline: Any = None def _init_local(self) -> None: from videopython.ai._optional import require CogVideoXPipeline = require("diffusers", "generation", feature="TextToVideo").CogVideoXPipeline requested_device = self.device device, dtype = _get_torch_device_and_dtype(self.device) model_name = "THUDM/CogVideoX1.5-5B" self._pipeline = CogVideoXPipeline.from_pretrained(model_name, revision=pinned(model_name), torch_dtype=dtype) self._pipeline.to(device) self.device = device log_device_initialization( "TextToVideo", requested_device=requested_device, resolved_device=device, ) def generate_video( self, prompt: str, num_steps: int = 50, num_frames: int = 81, guidance_scale: float = 6.0, ) -> Video: """Generate video from text prompt.""" import torch if self._pipeline is None: self._init_local() video_frames = self._pipeline( prompt=prompt, num_inference_steps=num_steps, num_frames=num_frames, guidance_scale=guidance_scale, generator=torch.Generator(device=self.device).manual_seed(42), ).frames[0] video_frames = np.asarray(video_frames, dtype=np.uint8) return Video.from_frames(video_frames, fps=16.0) def unload(self) -> None: """Release the diffusion pipeline so the next generate_video() re-initializes.""" self._pipeline = None release_device_memory(self.device) ``` -------------------------------- ### __init__ Source: https://videopython.com/api/ai/understanding Initializes the semantic scene detector with configurable parameters for threshold, minimum scene length, and device selection. ```APIDOC ## __init__ ### Description Initializes the semantic scene detector with configurable parameters for threshold, minimum scene length, and device selection. ### Method __init__ ### Parameters #### Path Parameters None #### Query Parameters None #### Request Body None ### Parameters - **threshold** (float) - Optional - Confidence threshold for scene boundaries (0.0-1.0). Higher values = fewer, more confident boundaries. Default: 0.5 - **min_scene_length** (float) - Optional - Minimum scene duration in seconds. Default: 0.5 - **device** (str | None) - Optional - Device to run on ('cuda', 'mps', 'cpu', or None for auto). Note: MPS may have numerical inconsistencies; use 'cpu' for reproducible results. Default: None ### Request Example None ### Response #### Success Response (200) None #### Response Example None ``` -------------------------------- ### Install FFmpeg on macOS Source: https://videopython.com/getting-started/installation Installs FFmpeg using Homebrew on macOS. FFmpeg is a required prerequisite for VideoPython. ```bash # macOS brew install ffmpeg ``` -------------------------------- ### Initialize VideoDubber with Flat Kwargs or DubbingConfig Source: https://videopython.com/api/ai/dubbing Demonstrates two ways to initialize VideoDubber: using flat keyword arguments for ad-hoc calls or by creating an explicit DubbingConfig object for reusable presets. The flat kwargs approach is recommended for quick, one-off operations. ```python from videopython.ai.dubbing import DubbingConfig, VideoDubber # Flat kwargs (recommended for ad-hoc calls) dubber = VideoDubber(device="cuda", low_memory=True, whisper_model="large") # Explicit config (recommended for reusable presets) config = DubbingConfig( device="cuda", low_memory=True, whisper_model="large", translator="qwen3", vocabulary=["Klarna", "Allegro"], ) dubber = VideoDubber(config=config) ``` -------------------------------- ### Use VideoEdit with Dictionary Configuration Source: https://videopython.com/getting-started/quickstart Demonstrates creating a VideoEdit plan using a dictionary, which mirrors the JSON wire format. This includes validation before running the edit and saving the output. ```python from videopython.editing import VideoEdit edit = VideoEdit.from_dict({ "segments": [{ "source": "input.mp4", "start": 0, "end": 10, "operations": [ {"op": "resize", "width": 1280, "height": 720}, {"op": "resample_fps", "fps": 30}, ], }] }) print(edit.validate()) # predicted VideoMetadata, no frames loaded edit.run_to_file("output.mp4") ``` -------------------------------- ### Initialize SceneVLM and AudioClassifier Source: https://videopython.com/api/ai/video_analysis Initializes SceneVLM and AudioClassifier based on configuration. Includes error handling for initialization failures, logging warnings if components cannot be loaded. ```python scene_vlm: SceneVLM | None try: scene_vlm = SceneVLM(**self.config.get_params(SCENE_VLM)) if SCENE_VLM in enabled else None except (ImportError, OSError, RuntimeError, ValueError): logger.warning("Failed to initialize SceneVLM, skipping visual understanding", exc_info=True) scene_vlm = None try: audio_classifier = ( AudioClassifier(**self.config.get_params(AUDIO_CLASSIFIER)) if AUDIO_CLASSIFIER in enabled else None ) except (ImportError, OSError, RuntimeError, ValueError): logger.warning("Failed to initialize AudioClassifier, skipping audio classification", exc_info=True) audio_classifier = None ``` -------------------------------- ### Load Videos from File, Segment, or Image Source: https://videopython.com/getting-started/quickstart Demonstrates how to load video files, specific segments of videos, or create videos from static images using the Video class. Also shows how to access basic video metadata. ```python from videopython.base import Video # Load from file video = Video.from_path("input.mp4") # Load a specific segment (more efficient for long videos) video = Video.from_path("input.mp4", start_second=10, end_second=20) # Create from a static image import numpy as np image = np.zeros((1080, 1920, 3), dtype=np.uint8) # Black frame video = Video.from_image(image, fps=24, length_seconds=3.0) # Check video properties print(video.metadata) # 1920x1080 @ 30fps, 10.5 seconds print(video.total_seconds) print(video.frame_shape) # (height, width, channels) ``` -------------------------------- ### Get Per-Operation JSON Schema Source: https://videopython.com/api/operations Retrieve the JSON schema for a specific operation class. Use `llm_json_schema()` to get a schema suitable for LLM interactions, excluding server-only fields. ```python from videopython.editing import Operation cls = Operation.get("blur_effect") schema = cls.model_json_schema() # full (all fields) llm_schema = cls.llm_json_schema() # LLM-facing (llm_hidden dropped) ``` -------------------------------- ### Initialize VideoDubber Source: https://videopython.com/api/ai/dubbing Initialize the VideoDubber with a configuration object or keyword arguments. If both are provided, a TypeError is raised. The configuration is logged upon initialization. ```python class VideoDubber: """Dubs videos into different languages using the local pipeline. Accepts either a :class:`DubbingConfig` or the same knobs as flat kwargs (``device``, ``low_memory``, ``whisper_model``, ``translator``, etc.) -- the flat path builds a ``DubbingConfig`` internally. See :class:`DubbingConfig` for the full knob list and defaults. """ def __init__( self, config: DubbingConfig | None = None, *, tts_backend: SpeechBackend | None = None, **kwargs: Any, ): if config is not None and kwargs: raise TypeError("Pass either `config=` or knob kwargs, not both") self.config = config or DubbingConfig(**kwargs) # Optional injected speech backend. None -> the pipeline lazily builds # the local chatterbox-backed TextToSpeech (requires the [tts] extra). # Inject a SpeechBackend to dub with only [dub] installed. self._tts_backend = tts_backend self._local_pipeline: Any = None logger.info( "VideoDubber initialized with %s", " ".join(f"{k}={v}" for k, v in self.config.init_log_fields().items()), ) ``` -------------------------------- ### Apply Zoom Effect Source: https://videopython.com/api/effects Progressively zooms into or out of the frame center. The zoom factor must be greater than 1. 'in' mode starts wide and zooms in, 'out' mode starts tight and zooms out. ```python class Zoom(Effect): """Progressively zooms into or out of the frame center over the clip duration.""" op: Literal["zoom_effect"] = "zoom_effect" streamable: ClassVar[bool] = True zoom_factor: float = Field( gt=1, description="How far to zoom. 1.5 is a subtle push, 2.0 is moderate, 3.0+ is dramatic. Must be greater than 1.", ) mode: Literal["in", "out"] = Field( description='"in" starts wide and pushes into the center, "out" starts tight and pulls back.', ) _stream_crops: np.ndarray | None = PrivateAttr(default=None) _stream_width: int = PrivateAttr(default=0) _stream_height: int = PrivateAttr(default=0) def _crop_sizes(self, n_frames: int, width: int, height: int) -> np.ndarray: crop_w = np.linspace(width // self.zoom_factor, width, n_frames) crop_h = np.linspace(height // self.zoom_factor, height, n_frames) if self.mode == "in": crop_w, crop_h = crop_w[::-1], crop_h[::-1] return np.stack([crop_w, crop_h], axis=1) def streaming_init(self, total_frames: int, fps: float, width: int, height: int, **_context: Any) -> None: self._stream_crops = self._crop_sizes(total_frames, width, height) self._stream_width = width self._stream_height = height def process_frame(self, frame: np.ndarray, frame_index: int) -> np.ndarray: assert self._stream_crops is not None idx = min(frame_index, len(self._stream_crops) - 1) w, h = self._stream_crops[idx] width, height = self._stream_width, self._stream_height x = width / 2 - w / 2 y = height / 2 - h / 2 cropped = frame[round(y) : round(y + h), round(x) : round(x + w)] return cv2.resize(cropped, (width, height)) ``` -------------------------------- ### Normalize Scene Boundaries Source: https://videopython.com/api/ai/video_analysis Normalizes scene boundaries based on video metadata, ensuring start and end times/frames are within valid ranges and ordered correctly. Handles edge cases where end times are less than or equal to start times. ```python def _normalize_scene_boundaries(self, scenes: list[SceneBoundary], metadata: VideoMetadata) -> list[SceneBoundary]: normalized: list[SceneBoundary] = [] max_time = float(metadata.total_seconds) max_frame = int(metadata.frame_count) for item in scenes: start = max(0.0, min(max_time, float(item.start))) end = max(0.0, min(max_time, float(item.end))) if end <= start: continue start_frame = int(item.start_frame) end_frame = int(item.end_frame) start_frame = max(0, min(max_frame, start_frame)) end_frame = max(0, min(max_frame, end_frame)) if end_frame <= start_frame: start_frame = int(round(start * metadata.fps)) end_frame = max(start_frame + 1, int(round(end * metadata.fps))) start_frame = max(0, min(max_frame, start_frame)) end_frame = max(0, min(max_frame, end_frame)) if end_frame <= start_frame: continue normalized.append( SceneBoundary( start=round(start, 6), end=round(end, 6), start_frame=start_frame, end_frame=end_frame, ) ) normalized.sort(key=lambda scene: (scene.start, scene.end)) return normalized ``` -------------------------------- ### __init__ Source: https://videopython.com/api/ai/understanding Initializes the face tracker. Users can configure how faces are selected, smoothing factors, detection intervals, minimum face size, and the backend (CPU/GPU). ```APIDOC ## __init__ ### Description Initializes the face tracker. ### Method __init__ ### Parameters #### Path Parameters None #### Query Parameters None #### Request Body None ### Parameters - **selection_strategy** (Literal['largest', 'centered', 'index']) - Optional - How to select which face to track. Options include 'largest' (default), 'centered', or 'index'. - **face_index** (int) - Optional - Index of face to track when using the 'index' strategy. Defaults to 0. - **smoothing** (float) - Optional - Exponential moving average factor (0-1). Higher values result in smoother tracking. Defaults to 0.8. - **detection_interval** (int) - Optional - Run detection every N frames and interpolate between detections. Defaults to 3. - **min_face_size** (int) - Optional - Minimum face size in pixels required for detection. Defaults to 30. - **backend** (Literal['cpu', 'gpu', 'auto']) - Optional - Specifies the detection backend. Can be 'cpu', 'gpu', or 'auto' (default). - **sample_rate** (int) - Optional - For GPU backend, detect every Nth frame and interpolate. Only used by track_video(). Defaults to 1. - **batch_size** (int) - Optional - Batch size for GPU detection. Defaults to 16. - **iou_match_threshold** (float) - Optional - Minimum IoU between consecutive detections to continue an existing per-shot track. Used by `track_shot`. Defaults to DEFAULT_IOU_MATCH_THRESHOLD. - **max_missed_frames** (int) - Optional - Maximum number of consecutive frames a track can go without detection before being closed. Defaults to DEFAULT_MAX_MISSED_FRAMES. ### Request Example ```json { "selection_strategy": "largest", "face_index": 0, "smoothing": 0.8, "detection_interval": 3, "min_face_size": 30, "backend": "auto", "sample_rate": 1, "batch_size": 16, "iou_match_threshold": 0.5, "max_missed_frames": 10 } ``` ### Response #### Success Response (200) This method initializes the tracker and does not return a value directly. The tracker object is configured for subsequent use. #### Response Example None (initialization) ``` -------------------------------- ### slice Source: https://videopython.com/api/core/audio Extracts a portion of the audio between specified start and end times. ```APIDOC ## slice ### Description Extracts a portion of the audio between specified start and end times. ### Method This is a method of the Audio class. ### Parameters #### Path Parameters None #### Query Parameters None #### Request Body None ### Parameters - **start_seconds** (float): Start time in seconds (default: 0.0). - **end_seconds** (float | None): End time in seconds (default: None, meaning end of audio). ### Returns - **Audio**: New Audio instance with the extracted portion. ### Raises - **ValueError**: If start_seconds or end_seconds are invalid. ``` -------------------------------- ### Apply Basic Transformations with VideoEdit Source: https://videopython.com/getting-started/quickstart Shows how to use VideoEdit and SegmentConfig to apply transformations like resizing and frame rate resampling to a video segment. The edited video is then saved to a file. ```python from videopython.editing import VideoEdit, SegmentConfig from videopython.editing.transforms import Resize, ResampleFPS edit = VideoEdit(segments=[ SegmentConfig( source="input.mp4", start=0, # cut the first 10 seconds... end=10, # ...via the segment range, not a cut operation operations=[ Resize(width=1280, height=720), ResampleFPS(fps=30), ], ) ]) edit.run_to_file("output.mp4") ``` -------------------------------- ### slice Source: https://videopython.com/api/core/audio Extract a portion of the audio between specified start and end times in seconds. ```APIDOC ## slice ### Description Extract a portion of the audio between start_seconds and end_seconds. ### Method ```python slice(start_seconds: float = 0.0, end_seconds: float | None = None) -> Audio ``` ### Parameters #### Path Parameters None #### Query Parameters None #### Request Body None ### Parameters - **start_seconds** (`float`) - Required - Start time in seconds (default: 0.0) - **end_seconds** (`float | None`) - Optional - End time in seconds (default: None, meaning end of audio) ### Request Example ```python # Example usage: audio_segment = audio_object.slice(start_seconds=10.5, end_seconds=25.0) ``` ### Response #### Success Response (200) - **Audio** (`Audio`) - New Audio instance with the extracted portion #### Response Example ```json { "audio_data": "...", "metadata": { "sample_rate": 44100, "channels": 2, "sample_width": 2, "duration_seconds": 14.5, "frame_count": 639450 } } ``` ### Raises - **ValueError**: If start_seconds or end_seconds are invalid ``` -------------------------------- ### CutFrames Source: https://videopython.com/api/transforms Cuts a video segment by specifying the start and end frame numbers. ```APIDOC ## CutFrames ### Description Cuts a video segment by specifying the start and end frame numbers. ### Method Not specified (likely a method call on a video object) ### Endpoint Not applicable (SDK method) ### Parameters #### Path Parameters None #### Query Parameters None #### Request Body None ### Request Example None provided ### Response #### Success Response Not specified #### Response Example None provided ``` -------------------------------- ### Initialize Video Object Source: https://videopython.com/api/core/video Constructs a Video object with frames, fps, and optional audio. If no audio is provided, a silent audio track is generated. ```python class Video: def __init__(self, frames: np.ndarray, fps: int | float, audio: Audio | None = None): self.frames = frames self.fps = fps if audio: self.audio = audio else: self.audio = Audio.create_silent( duration_seconds=round(self.total_seconds, 2), stereo=True, sample_rate=44100 ) ``` -------------------------------- ### CutSeconds Source: https://videopython.com/api/transforms Cuts a video segment by specifying the start and end times in seconds. ```APIDOC ## CutSeconds ### Description Cuts a video segment by specifying the start and end times in seconds. ### Method Not specified (likely a method call on a video object) ### Endpoint Not applicable (SDK method) ### Parameters #### Path Parameters None #### Query Parameters None #### Request Body None ### Request Example None provided ### Response #### Success Response Not specified #### Response Example None provided ``` -------------------------------- ### Initialize Local SceneVLM Model Source: https://videopython.com/api/ai/understanding Initializes the local Qwen3.5 model and its processor. It handles device selection and ensures correct data types are used to avoid conflicts with other models. ```python import time import torch from videopython.ai._optional import require from videopython.core.logging import logger from videopython.core.optional import pinned from videopython.core.utils import select_device, log_device_initialization, release_device_memory class SceneVLM: DEFAULT_MAX_IMAGE_PIXELS = 1024 * 1024 def __init__( self, model_size: str, model_name: str | None = None, device: str | None = None, max_new_tokens: int = 128, temperature: float = 0.0, max_image_pixels: int | None = None ) -> None: self.model_size: SceneVLMModelSize = model_size self.model_name = model_name or SCENE_VLM_MODEL_IDS[model_size] self.device = device self.max_new_tokens = max_new_tokens self.temperature = temperature self.max_image_pixels = max_image_pixels if max_image_pixels is not None else self.DEFAULT_MAX_IMAGE_PIXELS self._processor: Any = None self._model: Any = None if model_size == "27b": self._warn_if_vram_under_large_model_floor() @staticmethod def _warn_if_vram_under_large_model_floor() -> None: """Loud WARNING when ``model_size='27b'`` is requested on a small card. Does not raise -- a knowledgeable user may run the 27B model with their own quantization layer or accept device off-loading. The warning makes the eventual OOM (deep inside ``from_pretrained``) easier to diagnose. """ try: import torch if not torch.cuda.is_available(): logger.warning( "SceneVLM model_size='27b' requested but CUDA is not " "available. 27B FP16 weights are ~54 GB; running on " "CPU/MPS is likely to OOM." ) return free_bytes, _total = torch.cuda.mem_get_info() free_gb = free_bytes / (1024**3) if free_gb < _LARGE_MODEL_VRAM_WARN_GB: logger.warning( "SceneVLM model_size='27b' requested with %.1f GB free VRAM. " "Qwen3.5-27B FP16 needs ~54 GB for weights alone -- expect " "OOM during from_pretrained unless you wired up " "quantization or device offloading.", free_gb, ) except ImportError: pass def _init_local(self) -> None: """Initialize local Qwen3.5 model.""" import torch from videopython.ai._optional import require _transformers = require("transformers", "vision", feature="SceneVLM") AutoModelForImageTextToText = _transformers.AutoModelForImageTextToText AutoProcessor = _transformers.AutoProcessor t0 = time.perf_counter() requested_device = self.device resolved_device = select_device(self.device, mps_allowed=True) self._processor = AutoProcessor.from_pretrained(self.model_name, revision=pinned(self.model_name)) # Save and restore default dtype -- transformers torch_dtype="auto" can # mutate torch.get_default_dtype(), which breaks concurrent models # (e.g. Whisper) that expect float32. saved_dtype = torch.get_default_dtype() try: self._model = AutoModelForImageTextToText.from_pretrained( self.model_name, torch_dtype="auto", revision=pinned(self.model_name) ) finally: torch.set_default_dtype(saved_dtype) self._model.to(resolved_device) self._model.eval() self.device = resolved_device log_device_initialization( "SceneVLM", requested_device=requested_device, resolved_device=resolved_device, ) logger.info( "SceneVLM(%s, model_size=%s) model weights loaded in %.2fs", self.model_name, self.model_size, time.perf_counter() - t0, ) ``` -------------------------------- ### Basic LLM Video Editing Workflow Source: https://videopython.com/guides/llm-integration Demonstrates the core workflow of generating a video edit plan with an LLM, validating it, and running it to a file. Ensure 'videopython.ai' is imported for AI operations. ```python from videopython.editing import VideoEdit schema = VideoEdit.json_schema() plan = call_your_llm(schema=schema, prompt="Create a 15s highlight reel from input.mp4") edit = VideoEdit.from_dict(plan) predicted = edit.validate() # catches bad plans before any I/O print(predicted) edit.run_to_file("output.mp4") ``` -------------------------------- ### Get Overlay Opacity Source: https://videopython.com/api/effects Returns the opacity value for the overlay. This is a simple getter for the opacity attribute. ```python return self.opacity ``` -------------------------------- ### Get Supported Languages for Translation Source: https://videopython.com/api/ai/dubbing Retrieves a dictionary of supported languages for text translation, which can be used for dubbing. ```python @staticmethod def get_supported_languages() -> dict[str, str]: from videopython.ai.generation.translation import TextTranslator return TextTranslator.get_supported_languages() ``` -------------------------------- ### Initialize FaceTracker and Load Audio Source: https://videopython.com/api/ai/video_analysis Initializes the FaceTracker and attempts to load audio from a given path. Includes error handling for initialization and audio loading failures, logging warnings if issues occur. ```python face_tracker: FaceTracker | None = None if FACE_TRACKER in enabled: try: face_tracker = FaceTracker(**self.config.get_params(FACE_TRACKER)) except (ImportError, OSError, RuntimeError, ValueError): logger.warning("Failed to initialize FaceTracker, skipping face tracks", exc_info=True) face_tracker = None path_audio: Audio | None = None if audio_classifier is not None and source_path is not None: try: path_audio = Audio.from_path(source_path) except (OSError, RuntimeError, ValueError): logger.warning( "Failed to load audio from path, audio classification will use clip fallback", exc_info=True, ) path_audio = None ``` -------------------------------- ### Initialize TextToSpeech Source: https://videopython.com/api/ai/generation Initializes the TextToSpeech class for generating audio from text. This is a basic setup for the TTS functionality. ```python from videopython.ai import TextToSpeech tts = TextToSpeech() ``` -------------------------------- ### Create Audio from File Path (Deprecated) Source: https://videopython.com/api/core/audio Use `Audio.from_path()` instead of this deprecated method. It warns the user about the deprecation. ```python from_file(file_path: str | Path) -> Audio ``` ```python @classmethod def from_file(cls, file_path: str | Path) -> Audio: """Deprecated: Use from_path() instead.""" import warnings warnings.warn( "Audio.from_file() is deprecated, use Audio.from_path() instead", DeprecationWarning, stacklevel=2, ) return cls.from_path(file_path) ``` -------------------------------- ### Initialize and Generate Video from Image Source: https://videopython.com/api/ai/generation Initializes the local diffusion pipeline if not already loaded and then generates a video animation from a static image. Requires the 'diffusers' library. ```python from PIL import Image import numpy as np # Assuming Video and ManagedPredictor are defined elsewhere # from videopython.core.video import Video # from videopython.core.managed_predictor import ManagedPredictor # from videopython.ai.generation.video import _get_torch_device_and_dtype, log_device_initialization, release_device_memory # from videopython.core.pinned import pinned class ImageToVideo(ManagedPredictor): """Generates videos from static images using local video diffusion.""" def __init__(self, device: str | None = None): self.device = device self._pipeline: Any = None def _init_local(self) -> None: from videopython.ai._optional import require CogVideoXImageToVideoPipeline = require( "diffusers", "generation", feature="ImageToVideo" ).CogVideoXImageToVideoPipeline requested_device = self.device device, dtype = _get_torch_device_and_dtype(self.device) model_name = "THUDM/CogVideoX1.5-5B-I2V" self._pipeline = CogVideoXImageToVideoPipeline.from_pretrained( model_name, revision=pinned(model_name), torch_dtype=dtype ) self._pipeline.to(device) self.device = device log_device_initialization( "ImageToVideo", requested_device=requested_device, resolved_device=device, ) def generate_video( self, image: Image, prompt: str = "", num_steps: int = 50, num_frames: int = 81, guidance_scale: float = 6.0, ) -> Video: """Generate video animation from a static image.""" import torch if self._pipeline is None: self._init_local() video_frames = self._pipeline( prompt=prompt, image=image, num_inference_steps=num_steps, num_frames=num_frames, guidance_scale=guidance_scale, generator=torch.Generator(device=self.device).manual_seed(42), ).frames[0] video_frames = np.asarray(video_frames, dtype=np.uint8) return Video.from_frames(video_frames, fps=16.0) def unload(self) -> None: """Release the diffusion pipeline so the next generate_video() re-initializes.""" self._pipeline = None release_device_memory(self.device) ``` -------------------------------- ### Initialize Local Whisper Model Source: https://videopython.com/api/ai/understanding Loads the specified Whisper model locally. Requires the 'whisper' library to be installed. ```python def _init_local(self) -> None: """Initialize local Whisper model.""" from videopython.ai._optional import require whisper = require("whisper", "asr", feature="AudioToText") # No revision pin: openai-whisper downloads weights names from OpenAI's # own CDN, not via a HF from_pretrained repo, so there is no HF commit # SHA to pin (see videopython.ai._revisions module docstring). self._model = whisper.load_model(name=self.model_name, device=self.device) ``` -------------------------------- ### Initialize Audio Object Source: https://videopython.com/api/core/audio Initializes an Audio object with provided numpy array data and AudioMetadata. The data should be normalized between -1 and 1. ```python def __init__(self, data: np.ndarray, metadata: AudioMetadata): """ Initialize Audio object Args: data: Audio data as numpy array, normalized between -1 and 1 metadata: AudioMetadata object containing audio properties """ self.data = data self.metadata = metadata ``` -------------------------------- ### Get Audio Sample Count Source: https://videopython.com/api/core/audio Returns the total number of audio samples. This method is part of the Audio class. ```python def __len__(self) -> int: """Returns the number of samples""" return self.metadata.frame_count ``` -------------------------------- ### AudioEvent Duration Property Source: https://videopython.com/api/ai/understanding Calculates the duration of an AudioEvent in seconds. This property is derived from the start and end times of the event. ```python @property def duration(self) -> float: """Duration of the audio event in seconds.""" return self.end - self.start ``` -------------------------------- ### FaceTracker Initialization Source: https://videopython.com/api/ai/understanding Initializes the FaceTracker with a specified backend and logs the initialization. ```python self._detector: _FaceDetector | None = None self._last_position: tuple[float, float] | None = None self._last_size: tuple[float, float] | None = None self._smoothed_position: tuple[float, float] | None = None self._smoothed_size: tuple[float, float] | None = None logger.info("FaceTracker initialized with backend=%s", self.backend) ```