Opportunity
Compressed sensing (CS) is a revolutionary signal processing technique that allows signals to be accurately reconstructed from far fewer measurements than traditionally required by the Nyquist-Shannon theorem. This capability is particularly valuable in applications like medical imaging (e.g., MRI), single-pixel cameras, and snapshot compressive imaging, where reducing sampling rates can significantly lower costs and improve efficiency. However, traditional CS methods rely heavily on iterative optimization algorithms to enforce sparsity priors, which are computationally expensive and time-consuming. While deep learning (DL)-based CS methods have emerged to address these limitations, they often struggle with balancing local detail preservation (via CNNs) and global context modeling (via transformers). Existing CNN-based approaches lack the ability to capture long-range dependencies, while pure transformer-based methods fail to retain low-level spatial details. This patent addresses these gaps by proposing a hybrid network that synergizes the strengths of CNNs and transformers for adaptive CS reconstruction.
Technology
The patent introduces CSformer, a hybrid framework combining convolutional neural networks (CNNs) and transformers for end-to-end compressed sensing.
The key innovation lies in its progressive reconstruction strategy and dual-branch architecture:
1. Sampling Module: Replaces traditional random sampling matrices with learned convolutional kernels, enabling content-aware measurement.
2. Reconstruction Module:
- Initialization Branch: Uses 1×1 convolutions and sub-pixel upsampling to generate an initial reconstruction, mimicking traditional CS but in a learnable manner.
- CNN Branch: Extracts local spatial features through cascaded convolutional blocks.
- Transformer Branch: Models global dependencies via window-based multi-head self-attention (MSA), with inputs fused from CNN features.
3. Feature Aggregation: Transformer and CNN features are concatenated at multiple scales, followed by residual addition of the initial reconstruction to refine output details.
4. Windowed Transformers: Reduce computational complexity by computing self-attention within local windows, making high-resolution reconstruction feasible.
Advantages
- Superior Reconstruction Quality: Outperforms state-of-the-art methods (e.g., CSNet+, DPA-Net) by 1–2 dB PSNR on datasets like Urban100, especially at low sampling rates (1–10%).
- Robustness: Maintains performance under noisy measurements (tested with Gaussian noise σ = 0.01–0.5).
- Efficiency: Achieves lower FLOPs (18.4G) compared to pure CNN/transformer models (e.g., DPA-Net: 106G FLOPs).
- Flexibility: Adapts to dynamic textures and varying illumination conditions (validated on DTDB and MEF datasets).
Applications
- Medical Imaging: Accelerated MRI reconstruction with reduced scan times.
- Computational Photography: Single-pixel cameras for low-light or high-speed imaging.
- Satellite/Remote Sensing: Efficient data compression for transmission bandwidth-constrained environments.
- Video Compression: Frame reconstruction from sparse measurements in real-time systems.
