Opportunity
The rapid development of video services, especially the widespread adoption of Ultra-High-Definition (UHD) video, has outpaced the improvements in traditional compression technologies. Existing video coding standards (e.g., H.264/AVC, H.265/HEVC, and VVC) employ block-based hybrid compression frameworks but still face challenges in compression efficiency and performance for high-resolution and high dynamic range videos. Particularly in low-bandwidth networks and limited storage environments, efficiently transmitting and storing high-quality video data becomes a critical issue. Additionally, traditional methods suffer from redundancy and inaccuracies in motion estimation and context modeling, leading to reduced compression efficiency and poor reconstruction quality. These challenges present a significant opportunity for developing deep learning-based video compression technologies.
Technology
The patent proposes a deep learning-based double-enhanced video compression framework (Double Enhanced Modeling for Learned Video Compression) with two core innovations:
1. Enhanced Context Mining (ECM) Model: This model reduces redundancy across context channels through cross-channel interaction and residual learning. Specifically, ECM extracts cleaner context information from motion vectors and decoded frame features using convolution and residual operations, thereby improving the efficiency of the encoder and entropy model. ECM avoids batch normalization layers to maintain performance even with small batch sizes during training.
2. Transformer-based Post-Enhancement Backend Network: To address error propagation and enhance reconstruction quality, this network employs a full-resolution pipeline (avoiding information loss from downsampling/upsampling) and computes self-attention across channels rather than spatial dimensions via Transposed Gated Transformer Blocks (TGTBs), significantly reducing computational complexity and GPU memory usage. TGTB also removes layer normalization to further improve performance.
Advantages
- Higher Compression Efficiency: Saves an average of 36.4% bitrate compared to traditional standards like HEVC, with up to 46.33% savings for UVG sequences.
- Reduced Error Propagation: The post-enhancement network significantly mitigates inter-frame error accumulation.
- Full-Resolution Processing: Avoids information loss and artifacts from downsampling/upsampling operations.
- Low Computational Overhead: Cross-channel self-attention enables high-resolution frame processing on a single GPU.
Applications
- UHD Video Streaming Services: e.g., 4K/8K live broadcasting and video-on-demand platforms.
- Cloud Storage & Edge Computing: Reduces costs for storage and data transmission.
- Real-Time Communication: Enhances video call quality in low-bandwidth environments.
- Medical & Satellite Imaging: Efficient compression for high dynamic range image data.
