Method and System for Image Processing

Link Copied.

Opportunity

Object recognition in images is critical for robotics, computer vision, and AI-driven analysis. Traditional bottom-up or top-down strategies often suffer from high computational costs or insufficient accuracy. Many existing instance segmentation methods rely on bounding boxes and region-of-interest (RoI) pooling, which are computationally expensive and often include irrelevant background features. Vertex-based approaches use bounding box corners that frequently lie outside the actual object region, introducing noise. Furthermore, conventional methods fail to effectively distinguish between salient (important) and non-salient instances, leading to reduced confidence in predictions. There is a need for an efficient, geometrically constrained segmentation method that uses minimal key points to accurately delineate objects while incorporating saliency information to refine predictions.

Technology

This patent presents a key point-based salient instance segmentation network (KGDC-Net). The system receives an input image and uses a backbone network with Feature Pyramid Network (FPN) to extract multi-level features. Instance-aware heads (classification, box, and dynamic generation heads) are attached to each FPN layer. The key innovation is the Key point Guided Dynamic Convolution (KGDC) module. For each instance, it first identifies the centre point from the classification head. Using two 1×1 dynamic convolution layers combined with non-dynamic dilated convolutions, it predicts four peripheral points—the leftmost, rightmost, topmost, and bottommost points of the instance. These four points completely delineate any irregular shape at minimal cost.

The KGDC module then selects central features (at the centre) and peripheral features (via weighted average using Gaussian heatmaps). A Differentiated Patterns Fusion (DPF) module computes distance vectors between peripheral and central features, generates weights via softmax, and fuses the features to create three 1×1 dynamic segmentation filters. These filters convolve with bottom features to produce instance masks.

A High-Level Semantic Guidance Saliency (HSGS) module predicts an instance-agnostic saliency map using FPN features with CBAM attention. It computes a saliency score for each mask and multiplies it with the original classification score to produce a final confidence score. The entire network is trained end-to-end with classification, regression, peripheral point, mask, and saliency losses.

Advantages

Geometric Constraint via Peripheral Points: Four extreme points (min/max x,y) completely delineate any instance shape without requiring dense sampling or bounding box vertices.
Computationally Efficient: Uses 1×1 dynamic convolutions (few parameters) instead of 3×3, and avoids RoI pooling.
Differentiated Patterns Fusion: DPF module adaptively weights peripheral features based on their difference from central features, capturing diverse patterns.
Saliency-Aware Scoring: HSGS module refines confidence scores, reducing false positives from non-salient instances.
Region-of-Interest Free: No RoI cropping or alignment, simplifying the pipeline and reducing computation.
Robust to Irregular Shapes: Four peripheral points work for any shape (see Figure 7), unlike bounding boxes that include background.

Applications

Autonomous Driving: Segmenting vehicles, pedestrians, and obstacles from camera feeds.
Robotics: Object detection and manipulation in cluttered environments.
Surveillance & Security: Identifying salient objects (people, bags, weapons) in crowded scenes.
Medical Imaging: Segmenting organs, tumours, or cells from MRI/CT scans.
Scene Understanding & Image Captioning: Generating descriptive captions by identifying salient instances.