3 Network Underlay
3.1 Overview
This chapter covers the L3 underlay network - the physical infrastructure that provides IP connectivity between hosts. The underlay is a pure L3 BGP/ECMP fabric with Independent A/B Fabrics architecture.
Note: For definitions of terms used in this chapter, see the Glossary.
3.2 What is the Underlay?
The underlay network is the physical Layer 3 network infrastructure that: - Provides IP routing between hosts - Uses BGP for route advertisement - Uses ECMP for load balancing - Is completely unaware of overlay networks (GENEVE, VMs, containers)
Key Principle: The underlay’s only job is to move IP packets between hosts reliably and at full bandwidth.
3.3 Pure L3 Design
Our underlay is pure L3 - no L2 constructs: - No bridges: All switching is L3 routing - No VLANs: IP-only underlay - No EVPN/VXLAN at fabric: Fabric only routes IP packets - Point-to-point links: /31 links between all devices - BGP routing: Standard BGP for route advertisement - ECMP: Automatic load balancing across multiple paths
3.4 Independent A/B Fabrics
The underlay consists of two completely independent L3 networks:
- Fabric-A: All ToR-A switches and Spine-A switches
- Fabric-B: All ToR-B switches and Spine-B switches
- Zero shared state: No peer-links, no MLAG, no shared control plane
Each server connects to both fabrics via separate NICs: - eth0 → Fabric-A (ToR-A) - eth1 → Fabric-B (ToR-B)
3.5 Topology Evolution
3.5.1 Phase 1: Mesh Topology (5-6 racks)
- Fabric-A: All ToR-A switches interconnect in mesh via BGP
- Fabric-B: All ToR-B switches interconnect in mesh via BGP
- 8 uplink ports per ToR: Sufficient for mesh connectivity
- No spine switches needed: Mesh works well for small scale
3.5.2 Phase 2: Leaf-Spine Topology (7+ racks)
- Fabric-A: All ToR-A switches connect to Spine-A switches via BGP
- Fabric-B: All ToR-B switches connect to Spine-B switches via BGP
- Scalable: Leaf-spine provides non-blocking fabric
3.6 BGP Routing
3.6.1 Route Advertisement
- Host loopbacks: Each host advertises its loopback IP (
10.0.x.y/32) via BGP - ToR loopbacks: Each ToR advertises its loopback IP (
10.254.x.y/32) via BGP - Spine loopbacks: Each spine advertises its loopback IP (
10.255.0.x/32) via BGP
3.6.2 eBGP Peering
- Server ↔︎ ToR: eBGP peering on point-to-point links
- ToR ↔︎ Spine: eBGP peering on point-to-point links
- No iBGP: All peering is external BGP (eBGP)
- No route reflectors: Direct eBGP peering everywhere
3.7 ECMP Load Balancing
3.7.1 Equal-Cost Paths
When a route is advertised via multiple paths with equal BGP attributes, ECMP automatically creates multiple equal-cost paths.
Example: Host loopback 10.0.1.11/32 advertised via: - eth0 → ToR-A → Spine-A paths - eth1 → ToR-B → Spine-B paths
Result: ECMP distributes traffic across all available paths.
3.7.2 5-Tuple Hashing
ECMP uses 5-tuple hashing (source IP, destination IP, source port, destination port, protocol) to distribute traffic: - Same flow (same 5-tuple) → same path (no reordering) - Different flows → different paths (good distribution)
3.8 BFD (Bidirectional Forwarding Detection)
BFD provides fast failure detection: - Interval: 100-300ms (configurable) - Failure detection: <1 second - Integration: Works with BGP to quickly withdraw failed routes
3.9 IP Addressing
3.9.1 Host Loopbacks
- Range:
10.0.x.y/32 - Format:
10.0.{rack}.{host}/32 - Example: Rack 1, Host 11 =
10.0.1.11/32 - Purpose: Server identity, OVN TEP IP
3.9.2 ToR Loopbacks
- Range:
10.254.x.y/32 - Format:
10.254.{rack}.{tor}/32 - Example: Rack 1, ToR-A =
10.254.1.1/32 - Purpose: ToR identity
3.9.3 Spine Loopbacks
- Range:
10.255.0.x/32 - Format:
10.255.0.{spine}/32 - Example: Spine 1 =
10.255.0.1/32 - Purpose: Spine identity
3.9.4 Point-to-Point Links
- Host ↔︎ ToR:
172.16.{rack}.{host*2}/31 - ToR ↔︎ Spine:
172.20.{rack}.{link}/31 - All links: /31 addressing (RFC 3021)
3.10 Hardware
3.10.1 Switches
- ToR: 100G × 64 ports or 200G × 32 ports (Tomahawk-based)
- Spine: 400G switches (Tomahawk-based)
- Function: Pure L3 routers with BGP/ECMP
3.10.2 Server NICs
- 2 × 100G per server (NVIDIA ConnectX-6 DX)
- Hardware acceleration: GENEVE offload (for overlay)
- Underlay function: Pure L3 routing
3.11 Key Takeaways
- Pure L3: No L2 constructs, no EVPN/VXLAN at fabric
- Independent A/B Fabrics: Two separate networks with zero shared state
- BGP/ECMP: Standard protocols for routing and load balancing
- Scalable: Mesh for small scale, leaf-spine for large scale
- Simple: ~50 config lines per switch vs 300+ for EVPN
3.12 References
- Network Architecture Overview - Architecture principles and design
- Network Design & IP Addressing - Concrete IP plan
- BGP & Routing Configuration - BGP configuration details