Opportunity
Traditional video encoding systems, particularly end-to-end deep video compression (DVC) frameworks, face significant challenges in rate-distortion (R-D) optimization. A key problem is the limited availability of operational R-D points due to the sparse distribution of trained rate-distortion models. This sparsity complicates rate control, as it restricts the ability to achieve precise bitrate targets while minimizing distortion. Existing methods often require training multiple models to cover a wide range of bitrates, which is computationally expensive and impractical for real-time applications. Additionally, conventional rate control algorithms in hybrid video coding frameworks (e.g., HEVC, VVC) rely on quantization parameters (QP) or Lagrange multipliers, which lack flexibility in dynamically adjusting to varying video content. This patent addresses these limitations by introducing a scalable, adaptive encoding framework that expands the operational R-D points without requiring additional models, thereby improving rate control accuracy and compression efficiency.
Technology
The patent proposes a computer-implemented method for video encoding that leverages scaling-adaptive rate control and generalized rate-distortion models. The core innovation lies in dynamically adjusting the spatial resolution (via a rescaling parameter r) and the compression model (via a Lagrange multiplier λ) for each frame. The method involves:
- Bit Allocation: Allocating target bitrates for frames or groups of pictures (GOPs) using a sliding-window strategy.
- Encoding Parameter Determination: Selecting optimal {λ, r} pairs by minimizing distortion under bitrate constraints, using generalized models that relate bitrate (R), distortion (D), rescaling ratio (r), and λ. These models are expressed as:
- Rate Model: \( R = f_1(\lambda, r) = \alpha_1 \cdot \lambda^{\beta_1} \cdot r^{\gamma_1} \)
- Distortion Model: \( D = f_2(\lambda, r) = \alpha_2 \cdot \lambda^{\beta_2} \cdot r^{\gamma_2} \)
- Frame Processing: Rescaling frames (downsampling/upsampling) based on r and encoding them using the selected λ.
- Model Adaptation: Updating the rate-distortion models online using actual encoding results to improve accuracy for subsequent frames.
This approach transforms sparse R-D points into denser operational points by introducing resolution flexibility, enabling better rate control without additional model training.
Advantages
- Enhanced Rate Control: Achieves precise bitrate matching (e.g., <1% error="" in="" experiments)="" by="" dynamically="" adjusting="" r="" and="" λ.="">1%>
- Improved R-D Performance: Outperforms fixed-resolution DVC methods, with average BD-rate savings of 8.8% (PSNR) and 12.98% (MS-SSIM).
- Computational Efficiency: Eliminates the need for multi-pass encoding or extensive model training.
- Flexibility: Supports variable resolutions per frame, enabling adaptive encoding for complex video content.
Applications
- Real-time video streaming (e.g., adaptive bitrate streaming for platforms like YouTube or Netflix).
- Video compression standards (e.g., integration with HEVC, VVC, or future codecs).
- Edge devices (e.g., mobile phones, IoT devices) where computational resources are limited.
- High-frame-rate (HFR) and high-dynamic-range (HDR) video encoding.
