7  Network Design & IP Addressing

7.1 Overview

This chapter provides the concrete implementation details for IP addressing and network topology. For architectural principles, see Network Architecture Overview.

Note: For definitions of terms like TEP, ECMP, BGP, and others, see the Glossary.

7.2 Scale

  • Start: 6 racks
  • Up to: ~25 servers/rack
  • Total: ≤150 servers

7.3 Hardware Specifications

7.3.1 Server NICs

  • 2 × 100G NICs per server (NVIDIA ConnectX-6 DX)
  • Hardware GENEVE offload enabled
  • Total aggregate: 200G per server via pure L3 ECMP

7.3.2 Switch Hardware

  • ToR Switches:
    • Option 1: 100G × 64 ports (Tomahawk-based)
    • Option 2: 200G × 32 ports (Tomahawk-based)
  • Spine Switches:
    • 400G switches (Tomahawk-based)
  • All switches: Pure L3 routers, no L2 switching

7.4 Hierarchical IP Addressing

The architecture uses hierarchical addressing where IP addresses encode device role, pod number, and rack location.

7.4.1 Loopback IPs (Device Identity)

7.4.1.1 Network Devices

Spines: - Pattern: 10.254.{pod}.{spine}/32 - Examples: - NP1 Spine 1 = 10.254.1.1/32 - NP2 Spine 1 = 10.254.2.1/32

ToRs: - Pattern: 10.254.{pod-rack}.{tor}/32 - Examples: - NP1 Rack 1 ToR-A = 10.254.11.11/32 - NP1 Rack 2 ToR-B = 10.254.12.12/32

Super-Spines (when deployed): - Pattern: 10.254.100.{superspine}/32 - Examples: - SuperSpine-1 = 10.254.100.1/32 - SuperSpine-2 = 10.254.100.2/32

7.4.1.2 Host Encapsulation Loopbacks (TEP IPs)

Hosts: - Pattern: 10.255.{pod-rack}.{host}/32 - Examples: - NP1 Rack 1 Host 1 = 10.255.11.11/32 - NP2 Rack 1 Host 2 = 10.255.21.12/32

7.5 Concrete IP Plan (Current Phase)

7.5.1 A) Host Loopbacks (OVN GENEVE TEP IPs)

Reserve: 10.0.0.0/16 for host loopbacks (server identity)

Allocate per rack: - Rack1: 10.0.1.0/24 - Rack2: 10.0.2.0/24 - Rack3: 10.0.3.0/24 - Rack4: 10.0.4.0/24 - Rack5: 10.0.5.0/24 - Rack6: 10.0.6.0/24

Each host gets one /32, e.g., 10.0.1.11/32.

Key: Loopback is independent of physical links. It’s advertised via BGP through both NICs, creating equal-cost paths automatically.

7.5.4 D) Switch Loopbacks

Reserve: - 10.255.0.0/16 for spine loopbacks - 10.254.0.0/16 for ToR loopbacks

Examples: - Spines: 10.255.0.1/32, 10.255.0.2/32, … - ToRs: per rack e.g., 10.254.1.1/32 (ToR-A), 10.254.1.2/32 (ToR-B)

7.6 Network Topology

7.6.1 Complete Network Diagram

The following diagram shows the complete leaf-spine topology with three racks, two spines, and the Independent A/B Fabrics architecture:

Complete Network Topology Diagram

Key Features: - Spine Layer: Two spine switches (Spine 1 and Spine 2) provide redundant paths - ToR Switches: Each rack has two ToRs (ToR-A and ToR-B) in separate fabrics - Server Connections: Each server connects to both ToRs via separate NICs (eth0 to ToR-A, eth1 to ToR-B) - Point-to-Point Links: All links use /31 addressing (RFC 3021) - BGP Routing: All devices peer via eBGP and advertise loopback IPs - ECMP: Traffic automatically distributes across multiple equal-cost paths

7.6.2 Unified Fabric Topology

                    [Spine-1]           [Spine-2]
                    /  |  |  \         /  |  |  \
                   /   |  |   \       /   |  |   \
                  /    |  |    \     /    |  |    \
            [ToR-A1][ToR-B1][ToR-A2][ToR-B2][ToR-A3][ToR-B3]
                |      |       |      |       |      |
                |      |       |      |       |      |
            [Host-1]      [Host-2]      [Host-3]
            eth0 eth1     eth0 eth1     eth0 eth1

All ToRs connect to all Spines - single unified L3 fabric
Each host connects to both ToRs in its rack via separate NICs
ECMP distributes traffic across all available paths

Key Features: - Single routing domain: All ToRs and spines in same BGP AS or connected via eBGP - Full connectivity: Every ToR connects to every spine - Maximum ECMP: 8+ possible paths between any two hosts (2 NICs × 2 ToRs × 2 Spines) - No MLAG: ToR-A and ToR-B are independent, no peer-link

7.6.3 Host Multi-NIC Configuration

Each host has two separate routed interfaces connecting to dual ToRs in its rack:

  • eth0 → ToR-A (first ToR in rack) at 100G
    • Own IP: 172.16.x.x/31 (point-to-point)
    • Advertises loopback via eBGP
  • eth1 → ToR-B (second ToR in rack) at 100G
    • Own IP: 172.16.x.x/31 (point-to-point)
    • Advertises same loopback via eBGP
  • Loopback (10.0.x.y/32) = Server identity / OVN TEP
    • Advertised via BOTH NICs with equal BGP attributes
    • Creates equal-cost paths → ECMP across all ToRs and spines
  • Result: 200G aggregate bandwidth with 8+ ECMP paths in unified fabric

Path diversity: Traffic from H1 to H2 can use any combination of: - 2 source NICs (eth0 or eth1) × 2 local ToRs × 2+ spines × 2 destination ToRs = 8+ paths!

For configuration scripts, see Configuration Examples.

7.7 Important Note on Summarization

You can summarize later (e.g., advertise 10.0.1.0/24 per rack instead of all /32s), but only if you keep correctness.

If you summarize /24 from both ToR-A and ToR-B, and a host loses its link to ToR-A, ToR-A may no longer know that host’s /32 — but spines might still send traffic for that host to ToR-A because of the /24 summary → potential blackhole unless:

  • you have a ToR-A ↔︎ ToR-B L3 interconnect to forward internally, or
  • you avoid summarizing and keep /32s in the core (recommended for now)

Given your size, don’t summarize yet. Keep /32s end-to-end. Revisit summarization when you’re at “many racks / many thousands of hosts” and after confirming FIB scale on the exact ToR/spine models.

7.8 Egress Racks (Border/F5)

If you have 2 racks with dual F5 load balancers:

Treat them as “border racks”: - Border ToRs connect to F5s and upstreams - F5 ownership model (VIP1 active on A, VIP2 active on B) works well if the owning F5 advertises the VIP /32 into the fabric, so return traffic stays symmetric

7.9 References