5 Network Design & IP Addressing
5.1 Overview
This chapter provides the concrete implementation details for IP addressing and network topology. For architectural principles, see Network Architecture Overview.
5.2 Scale & Capacity
Start: 3 racks
Scale to: 16 racks maximum
Servers per rack: ~25 servers
Total capacity: Up to 400 servers
Hardware: 2× 32-port 400G spine switches. Each rack has 2 ToRs, each ToR connects to both spines = 4 spine ports per rack. With 64 total spine ports (2×32), we support 16 racks maximum.
Future expansion beyond 16 racks: Add super-spine layer with Network Pods. Each pod supports 16 racks, super-spine interconnects multiple pods for unlimited scaling.
5.3 Address Blocks (What Each Range Is For)
Our IP allocation uses distinct ranges for different purposes:
- 10.254.0.0/16 → Device loopbacks (stable IDs)
- ToR lo0, Spine lo0, SuperSpine lo0
- 10.255.0.0/16 → Host loopbacks / GENEVE encap IPs (stable host identity)
- 172.16.0.0/16 → Host↔︎ToR point-to-point links (/31)
- 172.20.0.0/16 → ToR↔︎Spine point-to-point links (/31)
- 172.24.0.0/16 → Spine↔︎SuperSpine point-to-point links (/31) (future multi-pod)
5.3.1 Why We Keep P2P Link Ranges Separate From Loopbacks
Loopbacks are stable identities advertised widely; they should remain distinct and easy to recognize. P2P link IPs are “plumbing” for adjacency and may be numerous (/31 per physical link).
Separating ranges makes debugging and automation easier—at a glance you know whether an IP is identity vs adjacency. It avoids confusion and accidental policy mistakes (e.g., route filters or ACLs that target loopbacks vs link nets).
This is an operational convention; routing does not require it, but it improves clarity.
5.3.2 Quick Debugging Tips
When you see an IP address in logs or routing tables:
- 10.254.* → Network device loopback (ToR, Spine, SuperSpine)
- 10.255.* → Host identity / GENEVE TEP
- 172.16.* → Host↔︎ToR adjacency
- 172.20.* → ToR↔︎Spine adjacency
- 172.24.* → Spine↔︎SuperSpine adjacency
Instant recognition of what layer and role each IP serves.
5.4 Deterministic /31 Allocation Scheme (No Spreadsheet Needed)
For any rack with pod-rack number PR (e.g., 11, 12, 13…):
Rack PR allocation: 172.16.PR.0/24 - ToR-A half: 172.16.PR.0/25 (addresses 0-127) - ToR-B half: 172.16.PR.128/25 (addresses 128-255)
For host index h (0 to 63, supporting up to 64 hosts per ToR):
Host-to-ToR-A /31: 172.16.PR.(2*h)/31 - Host uses even address: 172.16.PR.(2*h) - ToR-A uses odd address: 172.16.PR.(2*h + 1)
Host-to-ToR-B /31: 172.16.PR.(128 + 2*h)/31 - Host uses even address: 172.16.PR.(128 + 2*h) - ToR-B uses odd address: 172.16.PR.(128 + 2*h + 1)
5.4.1 Example: NP1 Rack1 Host11
Pod-rack PR = 11, host index h = 11:
eth0↔︎ToR-A: - /31 subnet: 172.16.11.(2*11)/31 = 172.16.11.22/31 - Host11 eth0: 172.16.11.22 - ToR-A: 172.16.11.23
eth1↔︎ToR-B: - /31 subnet: 172.16.11.(128 + 2*11)/31 = 172.16.11.150/31 - Host11 eth1: 172.16.11.150 - ToR-B: 172.16.11.151
This deterministic scheme eliminates the need for IP allocation spreadsheets—you can calculate any host’s IPs from pod, rack, and host numbers.
5.5 Hardware Specifications
5.5.1 Server NICs
- 2 × 100G NICs per server (NVIDIA ConnectX-6 DX)
- Hardware GENEVE offload enabled
- Total aggregate: 200G per server via pure L3 ECMP
5.5.2 Switch Hardware
- ToR Switches:
- Option 1: 100G × 64 ports (Tomahawk-based)
- Option 2: 200G × 32 ports (Tomahawk-based)
- Spine Switches:
- 400G switches (Tomahawk-based)
- All switches: Pure L3 routers, no L2 switching
5.6 Hierarchical IP Addressing
The architecture uses hierarchical addressing where IP addresses encode device role, pod number, and rack location.
5.6.1 Loopback IPs (Device Identity)
5.6.1.1 Network Devices
Spines: - Pattern: 10.254.{pod}.{spine}/32 - Examples: - NP1 Spine 1 = 10.254.1.1/32 - NP2 Spine 1 = 10.254.2.1/32
ToRs: - Pattern: 10.254.{pod-rack}.{tor}/32 - Examples: - NP1 Rack 1 ToR-A = 10.254.11.11/32 - NP1 Rack 2 ToR-B = 10.254.12.12/32
Super-Spines (when deployed): - Pattern: 10.254.100.{superspine}/32 - Examples: - SuperSpine-1 = 10.254.100.1/32 - SuperSpine-2 = 10.254.100.2/32
5.6.1.2 Host Encapsulation Loopbacks (TEP IPs)
Hosts: - Pattern: 10.255.{pod-rack}.{host}/32 - Examples: - NP1 Rack 1 Host 1 = 10.255.11.11/32 - NP2 Rack 1 Host 2 = 10.255.21.12/32
5.6.2 Point-to-Point Link IPs
5.6.2.1 Host ↔︎ ToR Links
- Pool:
172.16.{pod-rack}.0/24per rack - Split: A/B halves
- ToR-A side:
172.16.{pod-rack}.0/25 - ToR-B side:
172.16.{pod-rack}.128/25
- ToR-A side:
- Link Type: /31 (point-to-point, RFC 3021)
- Example: NP1 Rack 1 =
172.16.11.0/24- A side:
172.16.11.0/25(up to 64 hosts) - B side:
172.16.11.128/25(up to 64 hosts)
- A side:
Example for NP1 Rack 1 Host 11: - Host eth0 ↔︎ ToR-A: 172.16.11.22/31 (host: .22, ToR-A: .23) - Host eth1 ↔︎ ToR-B: 172.16.11.150/31 (host: .150, ToR-B: .151)
5.6.2.2 ToR ↔︎ Spine Links
- Pool:
172.20.{pod}.0/22per Network Pod - Link Type: /31 (point-to-point)
- Examples:
- NP1 =
172.20.1.0/22 - NP2 =
172.20.2.0/22
- NP1 =
5.6.2.3 Spine ↔︎ Super-Spine Links (Future)
- Pool:
172.24.100.0/24 - Link Type: /31 (point-to-point)
- Used when: Super-spine layer is deployed
5.7 Concrete IP Plan (Current Phase: NP1 Only)
Note: This uses the hierarchical addressing scheme. For future expansion with multiple Network Pods and super-spine, see the hierarchical plan above.
5.7.1 A) Host Loopbacks (OVN GENEVE TEP IPs)
Reserve: 10.255.0.0/16 for host encapsulation loopbacks (TEP IPs)
Allocate per rack (current deployment = NP1 only): - NP1 Rack1: 10.255.11.0/24 - NP1 Rack2: 10.255.12.0/24 - NP1 Rack3: 10.255.13.0/24 - NP1 Rack4: 10.255.14.0/24 - NP1 Rack5: 10.255.15.0/24 - NP1 Rack6: 10.255.16.0/24
Each host gets one /32, e.g., 10.255.11.11/32 (NP1 Rack1 Host11).
Key: Loopback is independent of physical links. It’s advertised via BGP through both NICs to both ToRs, creating equal-cost paths automatically through the unified fabric.
5.7.2 B) Host ↔︎ ToR Routed Links
Reserve: 172.16.0.0/16 for host uplinks
Per rack allocate one /24, split into A/B halves:
- NP1 Rack1:
172.16.11.0/24- ToR-A side:
172.16.11.0/25(up to 64 /31 links = 64 hosts) - ToR-B side:
172.16.11.128/25
- ToR-A side:
- NP1 Rack2:
172.16.12.0/24- ToR-A side:
172.16.12.0/25 - ToR-B side:
172.16.12.128/25
- ToR-A side:
- NP1 Rack3:
172.16.13.0/24- ToR-A side:
172.16.13.0/25 - ToR-B side:
172.16.13.128/25
- ToR-A side:
- (Similar pattern for racks 4-6)
Each host uses two /31s (one to ToR-A, one to ToR-B).
Example for NP1 Rack1 Host 11 (using deterministic scheme above): - host eth0↔︎ToR-A: 172.16.11.22/31 (host: .22, ToR-A: .23) - host eth1↔︎ToR-B: 172.16.11.150/31 (host: .150, ToR-B: .151)
Key: Each NIC has its own IP on a different point-to-point link. No bonding - pure L3 routing.
5.7.3 C) ToR ↔︎ Spine Links
Reserve: 172.20.1.0/22 for ToR↔︎Spine links (NP1)
Every ToR connects to every spine (full mesh between leaf and spine layers): - ToR-A (Rack1) → Spine-1: 172.20.1.0/31 - ToR-A (Rack1) → Spine-2: 172.20.1.2/31 - ToR-B (Rack1) → Spine-1: 172.20.1.4/31 - ToR-B (Rack1) → Spine-2: 172.20.1.6/31 - (Continue for all ToRs in all racks)
5.7.4 D) Switch Loopbacks
Reserve: 10.254.0.0/16 for all network device loopbacks
Current deployment (NP1 only): - Spines: 10.254.1.{id}/32 - Spine-1: 10.254.1.1/32 - Spine-2: 10.254.1.2/32 - ToRs: 10.254.{pod-rack}.{11|12}/32 (A=11, B=12) - Rack1 ToR-A: 10.254.11.11/32 - Rack1 ToR-B: 10.254.11.12/32 - Rack2 ToR-A: 10.254.12.11/32 - Rack2 ToR-B: 10.254.12.12/32
Future (with super-spine): - Super-Spines: 10.254.100.{id}/32 - SuperSpine-1: 10.254.100.1/32 - SuperSpine-2: 10.254.100.2/32
5.8 Network Topology
5.8.1 Complete Network Diagram
The following diagram shows the complete leaf-spine topology with dual ToRs per rack in a single unified L3 Clos fabric (Network Pod 1):
Key Features: - Single routing domain: All ToRs and spines connected via eBGP in unified fabric - Full connectivity: Every ToR connects to every spine (Clos topology) - Maximum ECMP: 8+ possible paths between any two hosts (2 NICs × 2 ToRs × 2 Spines) - No MLAG: ToR-A and ToR-B are independent, no peer-link
5.8.2 Host Multi-NIC Configuration
Each host has two separate routed interfaces connecting to dual ToRs in its rack:
- eth0 → ToR-A (first ToR in rack) at 100G
- Own IP:
172.16.{pod-rack}.x/31(point-to-point) - Advertises loopback via eBGP
- Own IP:
- eth1 → ToR-B (second ToR in rack) at 100G
- Own IP:
172.16.{pod-rack}.y/31(point-to-point) - Advertises same loopback via eBGP
- Own IP:
- Loopback (
10.255.{pod-rack}.{host}/32) = Server identity / OVN TEP- Example: NP1 Rack1 Host11 =
10.255.11.11/32 - Advertised via BOTH NICs with equal BGP attributes
- Creates equal-cost paths → ECMP across all ToRs and spines in the fabric
- Example: NP1 Rack1 Host11 =
- Result: 200G aggregate bandwidth with 8+ ECMP paths in unified fabric
Path diversity: Traffic from H1 to H2 can use any combination of: - 2 source NICs (eth0 or eth1) × 2 spines × 2 destination ToRs = 8 paths minimum!
For configuration scripts, see Configuration Examples.
5.9 Important Note on Summarization
You can summarize later (e.g., advertise 10.255.11.0/24 per rack instead of all /32s), but only if you keep correctness.
If you summarize /24 from both ToR-A and ToR-B, and a host loses its link to ToR-A, ToR-A may no longer know that host’s /32 — but spines might still send traffic for that host to ToR-A because of the /24 summary → potential blackhole unless:
- you have a ToR-A ↔︎ ToR-B L3 interconnect to forward internally, or
- you avoid summarizing and keep /32s in the core (recommended for now)
Given your size, don’t summarize yet. Keep /32s end-to-end. Revisit summarization when you’re at “many racks / many thousands of hosts” and after confirming FIB scale on the exact ToR/spine models.
5.10 Egress Racks (Border/F5)
If you have 2 racks with dual F5 load balancers:
Treat them as “border racks”: - Border ToRs connect to F5s and upstreams - F5 ownership model (VIP1 active on A, VIP2 active on B) works well if the owning F5 advertises the VIP /32 into the fabric, so return traffic stays symmetric
5.11 References
- Network Architecture Overview - Architecture principles
- BGP & Routing Configuration - BGP configuration details
- OpenStack Architecture Guide - L3 underlay design principles
- Canonical OpenStack Design Considerations - Canonical’s OpenStack network design
- RFC 3021 - /31 Point-to-Point Links - Point-to-point link addressing