3  Network Underlay

3.1 Overview

This chapter covers the L3 underlay network - the physical infrastructure that provides IP connectivity between hosts. The underlay is a pure L3 BGP/ECMP fabric with Independent A/B Fabrics architecture.

Note: For definitions of terms used in this chapter, see the Glossary.

3.2 What is the Underlay?

The underlay network is the physical Layer 3 network infrastructure that: - Provides IP routing between hosts - Uses BGP for route advertisement - Uses ECMP for load balancing - Is completely unaware of overlay networks (GENEVE, VMs, containers)

Key Principle: The underlay’s only job is to move IP packets between hosts reliably and at full bandwidth.

3.3 Pure L3 Design

Our underlay is pure L3 - no L2 constructs: - No bridges: All switching is L3 routing - No VLANs: IP-only underlay - No EVPN/VXLAN at fabric: Fabric only routes IP packets - Point-to-point links: /31 links between all devices - BGP routing: Standard BGP for route advertisement - ECMP: Automatic load balancing across multiple paths

3.4 Independent A/B Fabrics

The underlay consists of two completely independent L3 networks:

  • Fabric-A: All ToR-A switches and Spine-A switches
  • Fabric-B: All ToR-B switches and Spine-B switches
  • Zero shared state: No peer-links, no MLAG, no shared control plane

Each server connects to both fabrics via separate NICs: - eth0 → Fabric-A (ToR-A) - eth1 → Fabric-B (ToR-B)

3.5 Topology Evolution

3.5.1 Phase 1: Mesh Topology (5-6 racks)

  • Fabric-A: All ToR-A switches interconnect in mesh via BGP
  • Fabric-B: All ToR-B switches interconnect in mesh via BGP
  • 8 uplink ports per ToR: Sufficient for mesh connectivity
  • No spine switches needed: Mesh works well for small scale

3.5.2 Phase 2: Leaf-Spine Topology (7+ racks)

  • Fabric-A: All ToR-A switches connect to Spine-A switches via BGP
  • Fabric-B: All ToR-B switches connect to Spine-B switches via BGP
  • Scalable: Leaf-spine provides non-blocking fabric

3.6 BGP Routing

3.6.1 Route Advertisement

  • Host loopbacks: Each host advertises its loopback IP (10.0.x.y/32) via BGP
  • ToR loopbacks: Each ToR advertises its loopback IP (10.254.x.y/32) via BGP
  • Spine loopbacks: Each spine advertises its loopback IP (10.255.0.x/32) via BGP

3.6.2 eBGP Peering

  • Server ↔︎ ToR: eBGP peering on point-to-point links
  • ToR ↔︎ Spine: eBGP peering on point-to-point links
  • No iBGP: All peering is external BGP (eBGP)
  • No route reflectors: Direct eBGP peering everywhere

3.7 ECMP Load Balancing

3.7.1 Equal-Cost Paths

When a route is advertised via multiple paths with equal BGP attributes, ECMP automatically creates multiple equal-cost paths.

Example: Host loopback 10.0.1.11/32 advertised via: - eth0 → ToR-A → Spine-A paths - eth1 → ToR-B → Spine-B paths

Result: ECMP distributes traffic across all available paths.

3.7.2 5-Tuple Hashing

ECMP uses 5-tuple hashing (source IP, destination IP, source port, destination port, protocol) to distribute traffic: - Same flow (same 5-tuple) → same path (no reordering) - Different flowsdifferent paths (good distribution)

3.8 BFD (Bidirectional Forwarding Detection)

BFD provides fast failure detection: - Interval: 100-300ms (configurable) - Failure detection: <1 second - Integration: Works with BGP to quickly withdraw failed routes

3.9 IP Addressing

3.9.1 Host Loopbacks

  • Range: 10.0.x.y/32
  • Format: 10.0.{rack}.{host}/32
  • Example: Rack 1, Host 11 = 10.0.1.11/32
  • Purpose: Server identity, OVN TEP IP

3.9.2 ToR Loopbacks

  • Range: 10.254.x.y/32
  • Format: 10.254.{rack}.{tor}/32
  • Example: Rack 1, ToR-A = 10.254.1.1/32
  • Purpose: ToR identity

3.9.3 Spine Loopbacks

  • Range: 10.255.0.x/32
  • Format: 10.255.0.{spine}/32
  • Example: Spine 1 = 10.255.0.1/32
  • Purpose: Spine identity

3.10 Hardware

3.10.1 Switches

  • ToR: 100G × 64 ports or 200G × 32 ports (Tomahawk-based)
  • Spine: 400G switches (Tomahawk-based)
  • Function: Pure L3 routers with BGP/ECMP

3.10.2 Server NICs

  • 2 × 100G per server (NVIDIA ConnectX-6 DX)
  • Hardware acceleration: GENEVE offload (for overlay)
  • Underlay function: Pure L3 routing

3.11 Key Takeaways

  1. Pure L3: No L2 constructs, no EVPN/VXLAN at fabric
  2. Independent A/B Fabrics: Two separate networks with zero shared state
  3. BGP/ECMP: Standard protocols for routing and load balancing
  4. Scalable: Mesh for small scale, leaf-spine for large scale
  5. Simple: ~50 config lines per switch vs 300+ for EVPN

3.12 References