Link Copied.
Data Analytics System and Its Operation

中文版本

Opportunity  

In today's data-driven society, critical sectors such as healthcare, business, and government rely heavily on data availability and advanced analytics for accurate decision-making. However, a significant barrier exists: data is often fragmented and stored locally by individuals. Widespread concerns about data leakage and unauthorized sharing severely discourage data owners from contributing their information. Traditional data-sharing models involve a fundamental loss of control; once data is shared, it can be copied, traded, or misused in uncontrollable ways. Existing privacy frameworks that allow owners to define usage policies often require trusted hardware for enforcement, which is not universally deployed. Furthermore, systems relying on centralized trusted authorities for policy enforcement are themselves vulnerable to data breaches and misuse. A more insidious problem is metadata leakage. Even if data is encrypted, an attacker can infer sensitive information about a data owner by observing which analytic queries access their data. For instance, if a query is known to be initiated by a psychiatrist, inferring that a particular individual's data was used reveals potential mental health conditions. Existing techniques like Oblivious RAM (ORAM) to hide data access patterns are either designed for single-owner settings, rely on trusted proxies, or incur prohibitive computational costs in multi-owner, real-time data stream environments. Therefore, there is a pressing need for a practical, scalable data analytics system that ensures end-to-end data confidentiality, enforces owner-defined usage policies without centralized trust, and crucially, protects against metadata leakage to preserve participant privacy fully.

Technology  

The patent discloses a metadata-hiding data analytics system, exemplified by an embodiment called "Vizard." The core innovation is a two-server architecture that leverages lightweight cryptographic primitives to enable privacy-preserving, policy-controlled analytics over data streams without revealing which owners' data contributes to a specific query. The technology decouples data storage from policy enforcement using Distributed Point Functions (DPF). Each data owner generates a pair of complementary DPF keys based on their data usage policy (e.g., "consumer type = hospital AND region = EU"). These keys are stored separately on two non-colluding servers. The actual data is processed using a novel two-server Symmetric Homomorphic Stream Encryption (SHSE) scheme. Owners split their data streams into additive shares, each sent to a different server. Each server independently encrypts its shares with its own secret key and then exchanges ciphertext shares to construct a final homomorphically encrypted data stream stored identically on both servers. This allows servers to perform aggregations (e.g., summations over a time window) directly on ciphertexts. When a query arrives, both servers evaluate all owners' DPF keys against the query's description. The DPF evaluation yields a secret-shared control value (0 or 1) for each owner, indicating a policy match. The servers then use these control values to securely and obliviously aggregate only the encrypted data from matched owners. A secure two-party computation (S2PC) protocol is used to generate decryption keys, allowing the servers to produce secret shares of the final, plaintext aggregate result. The system also supports complex policy conditions with AND, OR, and NOT operators through optimized constructions using hash digests and cuckoo hashing, ensuring constant evaluation cost regardless of policy complexity. Finally, decentralized result release control is enforced by a Byzantine-fault-tolerant committee (RRC) that applies integrity, differential privacy, or payment-based policies before releasing results to consumers.

Advantages  

  • Metadata Hiding: Completely obscures data access patterns during query execution, preventing attackers from inferring sensitive information about data owners based on query participation.
  • Decentralized Trust: Eliminates reliance on a single trusted authority by using a two-server model (non-colluding) and a decentralized committee for result release, enhancing security and resilience.
  • Owner Control & Privacy: Provides data owners with enforceable, fine-grained control over how and by whom their data is used, while maintaining data confidentiality through encryption and secret sharing.
  • Practical Efficiency: Employs optimized cryptographic constructions (DPF, SHSE) and integration with scalable data pipelines like Apache Kafka, making it suitable for real-time, large-scale data stream analytics with manageable overhead.
  • Rich Policy Support: Enables complex data usage policies combining multiple conditions with AND, OR, and NOT operators, offering flexibility to data owners.
  • End-to-End Enforcement: Incorporates release policies (integrity, privacy, payment) that are enforced after computation, providing additional layers of control and protection for aggregated results.

Applications  

  • Healthcare Analytics: Secure aggregation of patient data from wearable devices (e.g., heart rate, glucose levels) for medical research by authorized institutions, without revealing individual patient identities or their specific conditions.
  • Financial Services: Privacy-preserving analysis of transaction streams or financial behavior across multiple banks for fraud detection or market trend analysis, complying with strict data sovereignty regulations.
  • Smart City/IoT: Collective analysis of sensor data (traffic, energy usage, environmental metrics) from citizens and businesses for urban planning, while protecting individual contributor privacy.
  • Federated Market Research: Companies can query aggregated consumer preference or behavior data from a pool of individuals who have specified precise conditions for data usage, enabling insights without personal data exposure.
  • Government & Census: Secure compilation of statistics from sensitive citizen data for policy-making, ensuring individual records cannot be linked to query results or inferred from access patterns.
Remarks
IDF:1425
IP Status
Patent filed
Technology Readiness Level (TRL)
4
Inventor(s)
Questions about this Technology?
Contact Our Tech Manager
Contact Our Tech Manager
Data Analytics System and Its Operation

Personal Information

(ReCaptcha V3 Hidden Field)

We use cookies to ensure you get the best experience on our website.

More Information