10 Hardware Acceleration and Future Evolution
10.1 Current Hardware Configuration
10.1.1 Server NICs
Each server has 2 × 100G NICs (ConnectX-6 DX): - eth0: 100G connection to ToR-A (Network-A) - eth1: 100G connection to ToR-B (Network-B) - Total aggregate: 200G per server via pure L3 ECMP
10.1.2 ConnectX-6 DX Hardware Acceleration
Mellanox/NVIDIA ConnectX-6 DX provides hardware acceleration for OVN/OVS:
10.1.2.1 GENEVE Offload
- Hardware GENEVE encapsulation/decapsulation: Offloads GENEVE processing from CPU
- Flow steering: Hardware-based packet classification and forwarding
- OVS hardware offload: Direct integration with OVS for accelerated forwarding
10.1.2.2 Benefits
- Reduced CPU overhead: GENEVE processing handled by NIC
- Higher throughput: Hardware acceleration provides line-rate performance
- Lower latency: Hardware forwarding faster than software
- Better scalability: More CPU available for workloads
10.1.2.3 Configuration
# Enable OVS hardware offload on ConnectX-6 DX
ovs-vsctl set Open_vSwitch . other_config:hw-offload=true
# Verify offload status
ovs-vsctl get Open_vSwitch . other_config:hw-offloadReference: NVIDIA ConnectX-6 DX Documentation
10.1.3 Switch Hardware
10.1.3.1 ToR Switches
- Option 1: 100G switches with 64 ports (e.g., Tomahawk-based)
- Option 2: 200G switches with 32 ports (e.g., Tomahawk-based)
- Chip: Broadcom Tomahawk ASIC
- Function: Pure L3 routing with BGP/ECMP
10.1.3.2 Spine Switches
- 400G switches (e.g., Tomahawk-based)
- High port density for leaf-spine connectivity
- Function: Pure L3 transit with ECMP
Key: All switches are L3 routers, not L2 switches. Tomahawk ASICs provide excellent L3 forwarding performance.
10.2 Future Evolution: DPUs (Data Processing Units)
10.2.1 What are DPUs?
DPUs (Data Processing Units) are specialized processors that offload networking, storage, and security functions from the host CPU. Examples include: - NVIDIA BlueField DPU - AMD Pensando - Intel IPU (Infrastructure Processing Unit)
10.2.2 How DPUs Fit Our Architecture
10.2.2.1 Current Architecture (Host-Based TEPs)
┌─────────────────────────────────────┐
│ Host CPU │
│ ┌──────────┐ ┌──────────┐ │
│ │ OVN │ │ OVS │ │
│ │ Control │ │ Dataplane│ │
│ └────┬─────┘ └────┬─────┘ │
│ │ │ │
│ ┌────▼─────────────▼─────┐ │
│ │ ConnectX-6 DX (NIC) │ │
│ │ Hardware GENEVE Offload │
│ └─────────────────────────┘ │
└─────────────────────────────────────┘
10.2.2.2 Future Architecture (DPU-Based TEPs)
┌─────────────────────────────────────┐
│ Host CPU (Workloads Only) │
│ ┌──────────┐ │
│ │ VMs │ │
│ │ Pods │ │
│ └────┬─────┘ │
│ │ │
│ ┌────▼──────────────────────────┐ │
│ │ BlueField DPU │ │
│ │ ┌──────────┐ ┌──────────┐ │ │
│ │ │ OVN │ │ OVS │ │ │
│ │ │ Control │ │ Dataplane│ │ │
│ │ └──────────┘ └──────────┘ │ │
│ │ Hardware GENEVE Offload │ │
│ │ Hardware BGP/ECMP │ │
│ └───────────────────────────────┘ │
└─────────────────────────────────────┘
10.2.3 Benefits of DPU Evolution
- Host CPU Offload: OVN/OVS processing moves to DPU, freeing host CPU for workloads
- Hardware Acceleration: DPUs provide hardware acceleration for:
- GENEVE encapsulation/decapsulation
- BGP routing
- ECMP load balancing
- Security policies (ACLs, firewalling)
- Consistent Architecture: TEPs still at “host” (now DPU), fabric still pure L3
- Better Performance: Dedicated processing for networking functions
- Isolation: Network processing isolated from workload CPU
10.2.4 Migration Path
When migrating to DPUs:
- TEP moves to DPU: DPU becomes the TEP endpoint
- Fabric unchanged: Still pure L3 BGP/ECMP
- OVN control plane: Runs on DPU, connects to same OVN databases
- BGP on DPU: DPU advertises host loopback via BGP
- Zero fabric changes: Underlay architecture remains identical
Key Insight: DPU evolution is transparent to the fabric. The underlay remains pure L3 BGP/ECMP regardless of where TEPs run.
10.3 Future Evolution: Higher Bandwidth Servers
10.3.1 Current: 2 × 100G (200G aggregate)
10.3.2 Future: 2 × 400G (800G aggregate)
10.3.2.1 Architecture Extension
No changes needed to fabric architecture:
- Same topology: Independent A/B Fabrics
- Same routing: Pure L3 BGP/ECMP
- Same principles: Loopback advertised via both NICs
- ECMP scales: Automatically handles higher bandwidth
10.3.2.2 What Changes
- NIC speeds: 100G → 400G per NIC
- Switch ports: ToR switches need 400G ports (or aggregate multiple 100G)
- Link speeds: Point-to-point links become 400G
- ECMP behavior: Same, just more bandwidth per path
10.3.2.3 Example Evolution
Current (2 × 100G): - eth0: 100G → ToR-A - eth1: 100G → ToR-B - Loopback: 10.0.1.11/32 advertised via both
Future (2 × 400G): - eth0: 400G → ToR-A (or 4×100G aggregated) - eth1: 400G → ToR-B (or 4×100G aggregated) - Loopback: 10.0.1.11/32 advertised via both (same!)
Key: The architecture is bandwidth-agnostic. Same design principles apply at any speed.
10.3.3 Future: 2 × 800G (1.6T aggregate)
Same principles: - Independent A/B Fabrics - Pure L3 BGP/ECMP - Loopback-based identity - ECMP automatic load balancing
Scalability: The architecture scales seamlessly from 100G to 800G+ per server.
10.4 Switch Evolution
10.4.1 Current ToR Options
- 100G × 64 ports: Sufficient for current server density
- 200G × 32 ports: Higher bandwidth per port
10.4.2 Future ToR Options
- 400G × 32 ports: For 400G servers
- 800G × 16 ports: For 800G servers
10.4.3 Spine Evolution
- Current: 400G spine switches
- Future: 800G or 1.6T spine switches
Key: Spine capacity must scale with aggregate ToR bandwidth.