5  Network Design & IP Addressing

5.1 Overview

This chapter provides the concrete implementation details for IP addressing and network topology. For architectural principles, see Network Architecture Overview.

5.2 Scale & Capacity

Start: 3 racks
Scale to: 16 racks maximum
Servers per rack: ~25 servers
Total capacity: Up to 400 servers

Hardware: 2× 32-port 400G spine switches. Each rack has 2 ToRs, each ToR connects to both spines = 4 spine ports per rack. With 64 total spine ports (2×32), we support 16 racks maximum.

Future expansion beyond 16 racks: Add super-spine layer with Network Pods. Each pod supports 16 racks, super-spine interconnects multiple pods for unlimited scaling.

5.3 Address Blocks (What Each Range Is For)

Our IP allocation uses distinct ranges for different purposes:

  • 10.254.0.0/16 → Device loopbacks (stable IDs)
    • ToR lo0, Spine lo0, SuperSpine lo0
  • 10.255.0.0/16 → Host loopbacks / GENEVE encap IPs (stable host identity)
  • 172.16.0.0/16 → Host↔︎ToR point-to-point links (/31)
  • 172.20.0.0/16 → ToR↔︎Spine point-to-point links (/31)
  • 172.24.0.0/16 → Spine↔︎SuperSpine point-to-point links (/31) (future multi-pod)

5.3.2 Quick Debugging Tips

When you see an IP address in logs or routing tables:

  • 10.254.* → Network device loopback (ToR, Spine, SuperSpine)
  • 10.255.* → Host identity / GENEVE TEP
  • 172.16.* → Host↔︎ToR adjacency
  • 172.20.* → ToR↔︎Spine adjacency
  • 172.24.* → Spine↔︎SuperSpine adjacency

Instant recognition of what layer and role each IP serves.

5.4 Deterministic /31 Allocation Scheme (No Spreadsheet Needed)

For any rack with pod-rack number PR (e.g., 11, 12, 13…):

Rack PR allocation: 172.16.PR.0/24 - ToR-A half: 172.16.PR.0/25 (addresses 0-127) - ToR-B half: 172.16.PR.128/25 (addresses 128-255)

For host index h (0 to 63, supporting up to 64 hosts per ToR):

Host-to-ToR-A /31: 172.16.PR.(2*h)/31 - Host uses even address: 172.16.PR.(2*h) - ToR-A uses odd address: 172.16.PR.(2*h + 1)

Host-to-ToR-B /31: 172.16.PR.(128 + 2*h)/31 - Host uses even address: 172.16.PR.(128 + 2*h) - ToR-B uses odd address: 172.16.PR.(128 + 2*h + 1)

5.4.1 Example: NP1 Rack1 Host11

Pod-rack PR = 11, host index h = 11:

eth0↔︎ToR-A: - /31 subnet: 172.16.11.(2*11)/31 = 172.16.11.22/31 - Host11 eth0: 172.16.11.22 - ToR-A: 172.16.11.23

eth1↔︎ToR-B: - /31 subnet: 172.16.11.(128 + 2*11)/31 = 172.16.11.150/31 - Host11 eth1: 172.16.11.150 - ToR-B: 172.16.11.151

This deterministic scheme eliminates the need for IP allocation spreadsheets—you can calculate any host’s IPs from pod, rack, and host numbers.

5.5 Hardware Specifications

5.5.1 Server NICs

  • 2 × 100G NICs per server (NVIDIA ConnectX-6 DX)
  • Hardware GENEVE offload enabled
  • Total aggregate: 200G per server via pure L3 ECMP

5.5.2 Switch Hardware

  • ToR Switches:
    • Option 1: 100G × 64 ports (Tomahawk-based)
    • Option 2: 200G × 32 ports (Tomahawk-based)
  • Spine Switches:
    • 400G switches (Tomahawk-based)
  • All switches: Pure L3 routers, no L2 switching

5.6 Hierarchical IP Addressing

The architecture uses hierarchical addressing where IP addresses encode device role, pod number, and rack location.

5.6.1 Loopback IPs (Device Identity)

5.6.1.1 Network Devices

Spines: - Pattern: 10.254.{pod}.{spine}/32 - Examples: - NP1 Spine 1 = 10.254.1.1/32 - NP2 Spine 1 = 10.254.2.1/32

ToRs: - Pattern: 10.254.{pod-rack}.{tor}/32 - Examples: - NP1 Rack 1 ToR-A = 10.254.11.11/32 - NP1 Rack 2 ToR-B = 10.254.12.12/32

Super-Spines (when deployed): - Pattern: 10.254.100.{superspine}/32 - Examples: - SuperSpine-1 = 10.254.100.1/32 - SuperSpine-2 = 10.254.100.2/32

5.6.1.2 Host Encapsulation Loopbacks (TEP IPs)

Hosts: - Pattern: 10.255.{pod-rack}.{host}/32 - Examples: - NP1 Rack 1 Host 1 = 10.255.11.11/32 - NP2 Rack 1 Host 2 = 10.255.21.12/32

5.7 Concrete IP Plan (Current Phase: NP1 Only)

Note: This uses the hierarchical addressing scheme. For future expansion with multiple Network Pods and super-spine, see the hierarchical plan above.

5.7.1 A) Host Loopbacks (OVN GENEVE TEP IPs)

Reserve: 10.255.0.0/16 for host encapsulation loopbacks (TEP IPs)

Allocate per rack (current deployment = NP1 only): - NP1 Rack1: 10.255.11.0/24 - NP1 Rack2: 10.255.12.0/24 - NP1 Rack3: 10.255.13.0/24 - NP1 Rack4: 10.255.14.0/24 - NP1 Rack5: 10.255.15.0/24 - NP1 Rack6: 10.255.16.0/24

Each host gets one /32, e.g., 10.255.11.11/32 (NP1 Rack1 Host11).

Key: Loopback is independent of physical links. It’s advertised via BGP through both NICs to both ToRs, creating equal-cost paths automatically through the unified fabric.

5.7.4 D) Switch Loopbacks

Reserve: 10.254.0.0/16 for all network device loopbacks

Current deployment (NP1 only): - Spines: 10.254.1.{id}/32 - Spine-1: 10.254.1.1/32 - Spine-2: 10.254.1.2/32 - ToRs: 10.254.{pod-rack}.{11|12}/32 (A=11, B=12) - Rack1 ToR-A: 10.254.11.11/32 - Rack1 ToR-B: 10.254.11.12/32 - Rack2 ToR-A: 10.254.12.11/32 - Rack2 ToR-B: 10.254.12.12/32

Future (with super-spine): - Super-Spines: 10.254.100.{id}/32 - SuperSpine-1: 10.254.100.1/32 - SuperSpine-2: 10.254.100.2/32

5.8 Network Topology

5.8.1 Complete Network Diagram

The following diagram shows the complete leaf-spine topology with dual ToRs per rack in a single unified L3 Clos fabric (Network Pod 1):

Single Network Pod - Leaf-Spine Topology

Key Features: - Single routing domain: All ToRs and spines connected via eBGP in unified fabric - Full connectivity: Every ToR connects to every spine (Clos topology) - Maximum ECMP: 8+ possible paths between any two hosts (2 NICs × 2 ToRs × 2 Spines) - No MLAG: ToR-A and ToR-B are independent, no peer-link

5.8.2 Host Multi-NIC Configuration

Each host has two separate routed interfaces connecting to dual ToRs in its rack:

  • eth0 → ToR-A (first ToR in rack) at 100G
    • Own IP: 172.16.{pod-rack}.x/31 (point-to-point)
    • Advertises loopback via eBGP
  • eth1 → ToR-B (second ToR in rack) at 100G
    • Own IP: 172.16.{pod-rack}.y/31 (point-to-point)
    • Advertises same loopback via eBGP
  • Loopback (10.255.{pod-rack}.{host}/32) = Server identity / OVN TEP
    • Example: NP1 Rack1 Host11 = 10.255.11.11/32
    • Advertised via BOTH NICs with equal BGP attributes
    • Creates equal-cost paths → ECMP across all ToRs and spines in the fabric
  • Result: 200G aggregate bandwidth with 8+ ECMP paths in unified fabric

Path diversity: Traffic from H1 to H2 can use any combination of: - 2 source NICs (eth0 or eth1) × 2 spines × 2 destination ToRs = 8 paths minimum!

For configuration scripts, see Configuration Examples.

5.9 Important Note on Summarization

You can summarize later (e.g., advertise 10.255.11.0/24 per rack instead of all /32s), but only if you keep correctness.

If you summarize /24 from both ToR-A and ToR-B, and a host loses its link to ToR-A, ToR-A may no longer know that host’s /32 — but spines might still send traffic for that host to ToR-A because of the /24 summary → potential blackhole unless:

  • you have a ToR-A ↔︎ ToR-B L3 interconnect to forward internally, or
  • you avoid summarizing and keep /32s in the core (recommended for now)

Given your size, don’t summarize yet. Keep /32s end-to-end. Revisit summarization when you’re at “many racks / many thousands of hosts” and after confirming FIB scale on the exact ToR/spine models.

5.10 Egress Racks (Border/F5)

If you have 2 racks with dual F5 load balancers:

Treat them as “border racks”: - Border ToRs connect to F5s and upstreams - F5 ownership model (VIP1 active on A, VIP2 active on B) works well if the owning F5 advertises the VIP /32 into the fabric, so return traffic stays symmetric

5.11 References