System and Method for Load Balancing in Data Networks

Link Copied.

Opportunity

Modern data centers rely heavily on efficient data networks to manage the increasing volume of information traffic. A significant challenge in such networks, especially in multi-layer architectures like Clos or fat-tree topologies, is the uneven distribution of data traffic, leading to congestion in certain paths while others remain underutilized. Traditional load-balancing methods, such as Equal-Cost Multi-Path Routing (ECMP), often fail to account for real-time congestion levels, resulting in suboptimal path selection. ECMP is congestion-agnostic, meaning it does not dynamically adjust to asymmetric network conditions caused by failures or varying traffic loads. This inefficiency can lead to poor throughput for large data flows ("elephant flows") and increased latency for smaller, time-sensitive flows ("mice flows"). The need for a scalable, congestion-aware load-balancing solution that can dynamically adapt to network conditions without excessive computational overhead is the primary motivation behind this patent.

Technology

The patent introduces a two-stage path selection process combined with localized congestion monitoring to address the limitations of existing load-balancing methods. The system operates in a distributed manner across network switches (e.g., ToR switches, aggregation switches, and core switches) without requiring centralized control.

1. Localized Congestion Monitoring: Each switch continuously monitors the congestion levels of its uplinks in both ingress and egress directions. This is achieved by tracking packet or byte counts and updating congestion metrics in real time. The congestion data is stored in a dedicated header within data packets, enabling efficient propagation of congestion information across the network.

2. Two-Stage Path Selection:
- Stage 1: Identifies the least congested path segment between the source and destination ToR switches by aggregating congestion metrics from the source's egress links and the destination's ingress links.
- Stage 2: Determines the optimal intermediate path segments (e.g., through aggregation or core switches) by comparing congestion levels of available uplinks. The process uses heuristic methods to avoid exhaustive path probing, reducing computational overhead.

The system encapsulates congestion data in specialized Ethernet headers (similar to VLAN tags) to facilitate efficient communication between switches. Path selection decisions are recorded in a Path Selection Table (PST) at each switch, which times out inactive entries to adapt to changing network conditions.

Advantages

Congestion-Aware: Dynamically adjusts paths based on real-time congestion data, improving throughput and reducing latency.
Scalable: Distributed protocol design eliminates the need for global congestion tracking, making it suitable for large-scale data centers.
Fault-Tolerant: Automatically reroutes traffic around failed or congested links using localized metrics.
Low Overhead: Two-stage heuristic reduces computational complexity compared to full-path probing.
Compatibility: Works with existing network hardware and topologies (e.g., Clos, fat-tree).

Applications

Data center networks requiring high throughput and low latency.
Cloud computing infrastructures with dynamic traffic patterns.
Distributed systems where asymmetric network conditions are common.
Real-time applications sensitive to network congestion (e.g., video streaming, financial transactions).

Remarks

IDF: 461

IP Status

Patent granted

Technology Readiness Level (TRL)