System and Method for Determining a Facial Expression

Link Copied.

Opportunity

Facial expressions are a universal form of non-verbal communication, crucial for conveying emotions, agreement, or dissent. The ability for computer systems to automatically detect and interpret these expressions is highly desirable across numerous fields, including marketing, security, and human-computer interaction. However, implementing robust, real-time facial expression recognition (FER) in computing systems has proven to be exceptionally challenging. Many existing solutions rely on complex, deep convolutional neural networks (CNNs) that, while accurate, demand significant computational power and memory. This heavy resource requirement makes them difficult to deploy on devices with limited processing capabilities, such as smartphones, edge devices, or embedded systems, and often prevents them from operating effectively in real-time scenarios.

Furthermore, a core technical problem is that many FER methods do not effectively prioritize the most expressive parts of the face, such as the eyes, mouth, and nose. They often treat all facial regions equally, which can lead to reduced accuracy and efficiency. While some techniques use basic 2D binary masks to highlight facial areas, these masks fail to capture the crucial 3D geometrical and spatial information of facial features, especially under varying head poses, occlusions, and lighting conditions typical of real-world environments. Consequently, there is a pressing need for a FER system that is both lightweight and computationally efficient for real-time applications, while also being accurate and robust by intelligently focusing on the most informative 3D geometric facial features.

Technology

This patent introduces a novel system and method for efficient and accurate facial expression determination. The core innovation lies in the synergistic combination of a 3D Facial Point Mask with a lightweight, custom-designed neural network architecture called SqueezExpNet.

The system first uses a face extraction processor to isolate facial images. A facial mask generator then detects a plurality of facial points (e.g., 51 points representing the 3D geometry of the face) from these images. Unlike prior art, it generates a sophisticated 3D facial point mask by multiplying each facial point by a weight. This weight is not binary but is inversely proportional to the Euclidean distance of each image pixel from the corresponding facial point. Pixels farther from a key facial point receive proportionally lower weights. This creates a smooth, distance-based importance map that effectively encodes the 3D geometrical and spatial information of critical facial features (eyes, mouth, nose), making the system resilient to variations in pose and occlusion.

This 3D mask is then combined with the original facial image and fed as a dual input into the SqueezExpNet learning network. The network features a dual-stage structure with squeeze-and-expand blocks, inspired by the efficient SqueezeNet architecture but made shallower for faster computation. Crucially, the network employs a recurrent input classifier that takes the feature maps from both stages and processes them recurrently. This allows the system to capture the spatial and geometric trends and relationships between the two stages, effectively modeling the important dependencies between facial geometry and texture.

The classification module concatenates these feature maps and outputs a probability distribution over predefined facial expressions (e.g., happy, sad, angry, surprise, as well as complex compound expressions like "happily surprised") using a softmax function. The use of a weight decay and an adaptive learning rate optimizer (ADADELTA) during training ensures robust convergence and prevents overfitting, a common problem in FER due to limited data.

Advantages

High Computational Efficiency: The SqueezExpNet architecture is significantly lighter and faster than traditional DCNN models like AlexNet or ResNet, making it suitable for real-time applications on resource-constrained devices.
Superior Accuracy: Demonstrated state-of-the-art performance on multiple benchmark datasets (RaFD, CFEE, RAFDB). For instance, it achieved 97.12% accuracy on RaFD and 93.85% on CFEE for basic expressions, outperforming existing methods.
Effective 3D Geometric Encoding: The 3D facial point mask intelligently incorporates spatial and geometrical information, allowing the network to focus on the most expressive facial parts, improving robustness against head poses, occlusions, and illumination changes.
Handles Compound Expressions: The system is capable of classifying not just basic expressions but also complex compound expressions (e.g., happily surprised, fearfully angry), which are more realistic and challenging, achieving an 89.09% average accuracy on the CFEE compound set.
Robust Generalization: Demonstrates strong cross-dataset performance, indicating its ability to generalize from controlled lab environments to uncontrolled, real-world images, as shown in the cross-database study (e.g., training on RaFD and testing on CFEE).

Applications

Consumer Electronics: Enabling emotionally aware smartphones, smart home devices, and wearables that can adapt their responses based on the user's mood.
Human-Computer Interaction (HCI): Developing more intuitive user interfaces, emotionally responsive video game characters, and realistic avatars for virtual reality (VR) and augmented reality (AR).
Automotive: Driver monitoring systems that detect driver fatigue, distraction, or aggression (e.g., anger) by analyzing facial expressions to enhance road safety.
Mental Healthcare & Assistive Technology: Aiding therapists in analyzing patient expressions for conditions like autism or depression, or providing communication tools for individuals with conditions that impair verbal expression.
Market Research & Advertising: Analyzing the spontaneous facial expressions of consumers in response to advertisements, products, or movie trailers to gain deep, unbiased insights into their emotional engagement and preferences.

Remarks

CIMDA: P00002

IP Status

Patent filed

Technology Readiness Level (TRL)