Link Copied.
Action Recognition System and Method

中文版本

Opportunity

Human action recognition from video is critical for applications like surveillance, gaming, and human-computer interaction. However, a major challenge is viewpoint variation—the subject's appearance changes drastically as the camera angle moves or the subject rotates. Existing methods rely on fixed, human-defined pre-processing (e.g., centering and aligning the skeleton) or require massive, multi-view datasets. These pre-processing strategies are not flexible enough for real-world situations like drone or surveillance footage, where the camera and subject move relative to each other. They also do not explicitly learn optimal viewpoints for action recognition, limiting their robustness and accuracy. A more adaptive, learnable approach is needed.

Technology

This patent presents an action recognition system with a View-Adaptive Neural Network that dynamically learns and corrects for viewpoint variations. The system receives 3D skeleton data (joint positions) from an RGB-D camera or pose estimation. For each frame, a View Adaptation Block applies an unsupervised learning algorithm to determine optimal transformation parameters (rotation angles α, β, γ and translation vector b). It then transforms the entire skeleton using a 3D rotation matrix, effectively re-orienting the subject to a canonical "best view" without manual pre-processing.

The transformed skeleton sequence is then fed into a Graph Neural Network (GNN). The GNN converts the skeleton into a graph (joints=nodes, bones=edges). Using adaptive graph convolutions, it learns both the physical connectivity and latent relationships between joints. Multiple residual blocks process spatio-temporal features, and a classifier outputs the recognized action. The entire network (view adaptation + GNN) is trained end-to-end, allowing the view parameters to be optimized specifically for action recognition accuracy.

Advantages

  • View-Invariant Recognition: Dynamically adapts to any camera angle without manual pre-processing, significantly improving accuracy under viewpoint changes.
  • Unsupervised View Learning: The view adaptation block learns optimal transformations using only classification loss, without requiring ground-truth viewpoint labels.
  • End-to-End Training: The view adaptation and GNN are jointly optimized, leading to better feature learning than separate pre-processing pipelines.
  • Outperforms State-of-the-Art: Achieves higher accuracy on NTU60 (94.18% CV, 86.21% CS) than methods using fixed pre-processing (A-GCN-P: 92.70% CV, 84.30% CS).
  • Parameter Efficient: The view adaptation block adds minimal parameters (<0.3M) compared to stacking more GCN layers (+2.33M for 4 layers), yet yields better performance gains.

Applications

  • Surveillance & Security: Recognizing suspicious actions or gestures from fixed or moving cameras (e.g., CCTV, drones) regardless of viewpoint.
  • Human-Computer Interaction: Enabling gesture control for smart TVs, gaming consoles, or VR/AR headsets without requiring user-facing orientation.
  • Autonomous Vehicles: Recognizing pedestrian gestures or driver actions from vehicle-mounted cameras at varying angles.
  • Healthcare & Rehabilitation: Monitoring patient exercises or daily activities from bedside cameras without restricting camera placement.
  • Sports Analytics: Analyzing athlete movements from broadcast footage captured from multiple, changing camera angles.
Remarks
CIMDA: P00037
IP Status
Patent filed
Technology Readiness Level (TRL)
4
Questions about this Technology?
Contact Our Tech Manager
Contact Our Tech Manager
Action Recognition System and Method

Personal Information

(ReCaptcha V3 Hidden Field)

We use cookies to ensure you get the best experience on our website.

More Information