10 Operations & Maintenance

10.1 Overview

This chapter provides operational procedures, troubleshooting guides, and maintenance tasks for the OpenStack DC Network. For architecture details, see Network Architecture Overview.

Note: For definitions of terms used in this chapter, see the Glossary.

10.2 Day-to-Day Operations

10.2.1 Monitoring BGP Sessions

Check BGP session status:

# Show BGP summary (all neighbors)
vtysh -c "show ip bgp summary"

# Expected output: All neighbors in "Established" state
# State/PfxRcd shows: Established / number of prefixes received

Check specific neighbor:

# Detailed neighbor information
vtysh -c "show ip bgp neighbors 172.16.11.1"

# Check routes received from neighbor
vtysh -c "show ip bgp neighbors 172.16.1.1 routes"

# Check routes advertised to neighbor
vtysh -c "show ip bgp neighbors 172.16.1.1 advertised-routes"

Healthy BGP session indicators: - State: Established - Uptime: Stable (not flapping) - PfxRcd: Expected number of routes - PfxSnt: Advertised routes

10.2.2 Checking OVN Health

OVN Controller status:

# Check OVN controller service
systemctl status ovn-controller

# Check OVN controller logs
journalctl -u ovn-controller -n 100

OVN configuration:

# Verify OVN configuration
ovs-vsctl get open . external-ids

# Expected: ovn-encap-ip=<host-loopback>, ovn-encap-type=geneve

OVN topology:

# Show OVN logical topology
ovn-nbctl show

# Show OVN southbound database
ovn-sbctl show

# List chassis (hosts)
ovn-sbctl list chassis

10.2.3 Verifying ECMP Paths

Check routing table for ECMP:

# View routing table
ip route show

# Look for ECMP routes (multiple nexthop)
ip route show | grep "nexthop"

# Example ECMP route:
# 10.255.12.11 proto bgp metric 20
#   nexthop via 172.16.11.23 dev eth0 weight 1
#   nexthop via 172.16.11.151 dev eth1 weight 1

Check specific destination:

# Route to specific host
ip route get 10.255.12.11

# Shows which path will be used based on 5-tuple hash

10.2.4 Network Connectivity Tests

Test host-to-host connectivity:

# Ping remote host TEP
ping -I 10.255.11.11 10.255.12.11

# Traceroute to see path
traceroute -s 10.255.11.11 10.255.12.11

# Test GENEVE tunnel
# (Traffic should go through automatically via OVN/OVS)

Verify MTU:

# Check interface MTU
ip link show eth0
ip link show eth1

# Should be 9000 for GENEVE overhead
# GENEVE adds ~38-50 bytes, so underlay needs MTU ≥ 1550 (or 9000 for jumbo frames)

10.3 Troubleshooting Procedures

10.3.1 BGP Not Establishing

Symptoms: BGP neighbor stuck in “Active”, “Connect”, or “Idle” state

Diagnosis:

Check basic connectivity:

# Ping neighbor
ping 172.16.11.23

# Check if neighbor is reachable
ip route get 172.16.11.23

Verify configuration:

# Check FRR configuration
vtysh -c "show run"

# Verify AS numbers
vtysh -c "show ip bgp summary"

# Check if ebgp-multihop is configured
vtysh -c "show run | include ebgp-multihop"

Check firewall rules:

# BGP uses TCP port 179
sudo iptables -L -n | grep 179

# Allow BGP if needed
sudo iptables -A INPUT -p tcp --dport 179 -j ACCEPT

Check FRR logs:

# View FRR logs
tail -f /var/log/frr/frr.log

# Or journalctl
journalctl -u frr -n 100

Common fixes: - Verify ebgp-multihop is configured - Check AS numbers match configuration - Ensure IP forwarding is enabled: sysctl net.ipv4.ip_forward - Verify no firewall blocking TCP port 179

10.3.2 OVN Tunnels Not Working

Symptoms: VMs can’t communicate across hosts

Diagnosis:

Verify TEP reachability:

# Ping remote TEP
ping 10.255.12.11

# Should succeed - if not, BGP/routing issue

Check OVN encapsulation config:

# Verify TEP IP
ovs-vsctl get open . external-ids:ovn-encap-ip

# Verify encapsulation type
ovs-vsctl get open . external-ids:ovn-encap-type

# Should be: geneve

Verify OVN central connectivity:

# Check if ovn-controller can reach OVN databases
ovs-vsctl get open . external-ids:ovn-remote

# Check ovn-controller logs
journalctl -u ovn-controller -n 100

Check MTU:

# GENEVE adds overhead
# Underlay MTU should be ≥ overlay MTU + 50 bytes
ip link show eth0 | grep mtu

# Should be 9000 (or at least 1550)

Verify GENEVE tunnels:

# Show OVS tunnels
ovs-vsctl show | grep genev

# Show tunnel ports
ovs-ofctl show br-int | grep genev

Common fixes: - Ensure TEP IP is reachable via BGP - Set MTU to 9000 on physical interfaces - Verify OVN central connectivity - Check ovn-controller service is running

10.3.3 Routes Not Propagating

Symptoms: Host /32 not visible in BGP or routing table

Diagnosis:

Check BGP advertisement:

# Check what's being advertised
vtysh -c "show ip bgp neighbors 172.16.1.1 advertised-routes"

# Should see host loopback /32

Verify route-map and prefix-list:

# Show route-map
vtysh -c "show route-map"

# Show prefix-list
vtysh -c "show ip prefix-list"

# Verify loopback is in prefix-list

Check network statement:

# Verify network is configured
vtysh -c "show run | include network"

# Should see: network 10.255.11.11/32

Check next-hop reachability:

# Next-hop must be reachable
ip route get 172.16.11.23

Common fixes: - Add network <loopback>/32 statement - Verify route-map permits the prefix - Check prefix-list includes loopback - Ensure next-hop is reachable

10.3.4 Performance Issues

Symptoms: Low bandwidth, high latency, packet loss

Diagnosis:

Check ECMP distribution:

# Verify ECMP is active
ip route show | grep "nexthop"

# Check if both paths are being used
# (Use traffic monitoring tools)

Monitor NIC utilization:

# Check interface stats
ip -s link show eth0
ip -s link show eth1

# Look for errors, drops

Verify GENEVE offload:

# Check OVS hardware offload
ovs-vsctl get Open_vSwitch . other_config:hw-offload

# Should be "true" for ConnectX-6 DX

Check for congestion:

# Monitor queue depth
tc -s qdisc show dev eth0

# Check for packet drops
netstat -s | grep -i drop

Common fixes: - Enable hardware offload: ovs-vsctl set Open_vSwitch . other_config:hw-offload=true - Verify ECMP maximum-paths is configured - Check for single-path failures (BGP session down) - Monitor for capacity issues

10.3.5 MTU Problems

Symptoms: Large packets fail, connectivity works for small packets

Diagnosis:

Check MTU end-to-end:

# Physical interfaces
ip link show eth0 | grep mtu
ip link show eth1 | grep mtu

# Should be 9000

# Test with ping
ping -M do -s 8972 10.255.12.11

# Should succeed (9000 - 28 bytes for IP/ICMP headers)

Verify GENEVE MTU:

# GENEVE overhead is ~38-50 bytes
# If overlay MTU is 1500, underlay needs ≥ 1550
# If overlay MTU is 9000, underlay needs ≥ 9050 (use 9000)

Common fixes: - Set underlay MTU to 9000: ip link set eth0 mtu 9000 - Verify all switches support jumbo frames - Check end-to-end MTU path

10.4 Maintenance Procedures

10.4.1 Upgrading a Single Fabric (Rolling Upgrade)

Procedure to upgrade Fabric-A (zero downtime):

Verify both fabrics healthy:

# Check BGP on all hosts
vtysh -c "show ip bgp summary"

# Ensure both Network-A and Network-B paths exist

Upgrade Spine-A switches one at a time:

# For each Spine-A switch:

# 1. Verify ECMP has alternate spines
# 2. Upgrade switch OS/config
# 3. Reboot switch
# 4. Verify BGP sessions re-establish
# 5. Wait for routes to stabilize
# 6. Move to next spine

Upgrade ToR-A switches one rack at a time:

# For each ToR-A switch:

# 1. Verify hosts have Network-B path
# 2. Upgrade ToR-A
# 3. Reboot ToR-A
# 4. Verify host BGP sessions re-establish
# 5. Verify host /32s are advertised
# 6. Move to next rack

Verify traffic distribution:

# After Fabric-A upgrade:
# Traffic should resume using both fabrics

ip route show | grep "nexthop"

Repeat for Fabric-B:
- Same procedure for Fabric-B switches
- Fabric-A now carries 100% load during upgrade

Key: Each fabric can be upgraded independently with zero impact on the other.

10.4.2 Adding New Racks

Procedure:

Physical installation:
- Install ToR-A and ToR-B switches
- Cable hosts to both ToRs
- Cable ToRs to spines (or other ToRs in mesh)

Allocate IP addresses:

# Determine next rack number (e.g., Rack 7)
# Allocate IP ranges:
# - Host loopbacks: 10.255.17.0/24
# - Host↔ToR links: 172.16.7.0/24
# - ToR loopbacks: 10.254.7.1/32 (ToR-A), 10.254.7.2/32 (ToR-B)

Configure ToR switches:
- Configure BGP peers (to hosts and spines)
- Set loopback IPs
- Configure point-to-point links
Configure hosts:
- Run network setup scripts
- Configure FRR/BGP
- Configure OVN

Verify BGP:

# On new hosts
vtysh -c "show ip bgp summary"

# On ToRs
# Verify host /32s are learned and advertised

Verify connectivity:

# From new host, ping existing host
ping -I 10.255.17.11 10.255.11.11

10.4.3 Replacing ToR Switches

Procedure to replace ToR-A in Rack 1:

Pre-check:

# Verify all hosts have path via ToR-B
# On each host in Rack 1:
vtysh -c "show ip route 10.255.0.0/16" | grep eth1

# Traffic will use Network-B during replacement

Shut down ToR-A gracefully:

# On ToR-A:
# Shut down BGP to drain traffic gracefully
vtysh -c "conf t" -c "router bgp 65101" -c "shutdown"

# Wait for BGP to withdraw routes
# Monitor: Traffic shifts to Network-B

Physical replacement:
- Power down old ToR-A
- Install new ToR-A
- Verify cabling
Configure new ToR-A:
- Apply configuration (same as old ToR-A)
- Set loopback IP: 10.254.1.1/32
- Configure BGP peers

Bring up BGP:

# On new ToR-A:
# BGP should auto-establish with hosts and spines
vtysh -c "show ip bgp summary"

Verify:

# On hosts in Rack 1:
# Verify ECMP routes re-appear
ip route show 10.255.0.0/16 | grep "nexthop"

# Should see both eth0 and eth1 paths

10.4.4 Replacing Spine Switches

Procedure to replace Spine-1 (in Fabric-A):

Pre-check:

# Verify alternate spines exist in Fabric-A
# Verify Fabric-B is healthy (will carry more load)

Graceful shutdown:

# On Spine-1:
vtysh -c "conf t" -c "router bgp 65010" -c "shutdown"

# BGP withdraws routes, ECMP redistributes to other spines

Physical replacement:
- Power down old Spine-1
- Install new Spine-1
- Verify cabling to all ToRs
Configure new Spine-1:
- Apply configuration
- Set loopback IP: 10.255.0.1/32
- Configure BGP peers to all ToR-A switches

Verify:

# On Spine-1:
vtysh -c "show ip bgp summary"

# Should see all ToR-A switches
# Should learn all host /32s

10.4.5 Decommissioning Hosts

Procedure:

Drain workloads:
- Migrate VMs to other hosts (OpenStack live migration)
- Drain Kubernetes pods (kubectl drain)

Shut down BGP:

# On host being decommissioned:
vtysh -c "conf t" -c "router bgp 66111" -c "shutdown"

# Loopback /32 withdrawn from fabric

Verify routes withdrawn:

# On ToRs:
vtysh -c "show ip bgp" | grep 10.255.11.11

# Should not appear

Power down:
- Stop OVN controller
- Power down host
- Remove from inventory

10.5 Failure Scenarios & Recovery

10.5.1 Single ToR Failure

Scenario: ToR-A in Rack 1 fails completely

What happens: 1. BFD detects link failures within 100-300ms 2. BGP sessions drop to ToR-A 3. BGP withdraws Network-A paths 4. ECMP automatically uses only Network-B paths 5. All traffic routes via eth1 → ToR-B

Impact: - Hosts in Rack 1 lose Network-A path - Network-B carries 100% load - No packet loss (if BGP/BFD configured correctly)

Recovery: 1. Immediate: Traffic automatically shifts to ToR-B 2. Replace ToR-A: Follow “Replacing ToR Switches” procedure 3. Verify: ECMP paths restore after replacement

Monitoring:

# On affected hosts:
ip route show | grep "nexthop"

# Should see only eth1 path during failure
# Should see both paths after recovery

10.5.2 Single Spine Failure

Scenario: Spine-1 (Fabric-A) fails

What happens: 1. BGP sessions drop between Spine-1 and all ToR-A switches 2. ToRs withdraw routes via Spine-1 3. ECMP redistributes across remaining Spine-A switches 4. Fabric-B continues normally

Impact: - ECMP fanout reduces in Fabric-A - Remaining Spine-A switches handle more load - Fabric-B unaffected

Recovery: - Replace spine following “Replacing Spine Switches” procedure - ECMP automatically redistributes after replacement

10.5.3 Complete Fabric-A Failure

Scenario: All of Fabric-A fails (ToR-A and Spine-A switches)

What happens: 1. All Network-A BGP sessions drop 2. All Network-A routes withdrawn 3. ECMP uses only Network-B paths 4. 100% traffic on Fabric-B

Impact: - All hosts lose Network-A connectivity - Fabric-B must handle 100% load - No packet loss if Fabric-B sized for 100% capacity

Recovery: - Diagnose root cause (power, configuration, hardware) - Restore Fabric-A switches - BGP sessions re-establish automatically - ECMP restores both paths

Critical: This is why each fabric must be sized for 100% load.

10.5.4 Host NIC Failure

Scenario: eth0 fails on a host

What happens: 1. BFD detects link failure 2. BGP session to ToR-A drops 3. BGP withdraws Network-A path for this host 4. Traffic to this host uses only Network-B path

Impact: - Host loses Network-A connectivity - All traffic via eth1 - Bandwidth reduced to 100G (from 200G)

Recovery:

# Diagnose NIC issue
ip link show eth0

# Check for hardware errors
ethtool -S eth0 | grep -i error

# Replace NIC if needed
# BGP re-establishes automatically after replacement

10.5.5 Power Domain Failure

Scenario: PDU-A fails, affecting all Fabric-A switches

What happens: - Same as “Complete Fabric-A Failure” - All Fabric-A switches lose power - Fabric-B handles 100% load

Recovery: - Restore power to PDU-A - Switches boot up - BGP sessions re-establish - ECMP restores

Prevention: This is why power domain separation is critical (see Network Architecture Overview).

10.6 Command Reference

10.6.1 FRR/BGP Commands

Status and information:

# BGP summary
vtysh -c "show ip bgp summary"

# All BGP routes
vtysh -c "show ip bgp"

# Specific route
vtysh -c "show ip bgp 10.255.11.11/32"

# Neighbor details
vtysh -c "show ip bgp neighbors 172.16.11.1"

# Advertised routes
vtysh -c "show ip bgp neighbors 172.16.1.1 advertised-routes"

# Received routes
vtysh -c "show ip bgp neighbors 172.16.1.1 routes"

# Routing table
vtysh -c "show ip route"

# BFD peers
vtysh -c "show bfd peers"

Configuration:

# Enter FRR shell
vtysh

# Enter config mode
conf t

# Show running config
show run

# Save config
write memory

10.6.2 OVN/OVS Commands

OVS status:

# Show OVS configuration
ovs-vsctl show

# Show bridges
ovs-vsctl list-br

# Show ports on bridge
ovs-vsctl list-ports br-int

# Show interfaces
ovs-vsctl list interface

# Show OVS configuration
ovs-vsctl get open . external-ids

OVN status:

# Show OVN logical topology (northbound)
ovn-nbctl show

# Show OVN southbound database
ovn-sbctl show

# List chassis
ovn-sbctl list chassis

# Show chassis details
ovn-sbctl show <chassis-name>

# Check OVN controller
systemctl status ovn-controller
journalctl -u ovn-controller -n 100

OVN configuration:

# Set TEP IP
ovs-vsctl set open . external-ids:ovn-encap-ip=10.255.11.11

# Set encapsulation type
ovs-vsctl set open . external-ids:ovn-encap-type=geneve

# Set OVN remote (central database)
ovs-vsctl set open . external-ids:ovn-remote=tcp:10.254.0.100:6642

10.6.3 Network Verification Commands

Interface status:

# Show all interfaces
ip addr show

# Show specific interface
ip link show eth0

# Show interface statistics
ip -s link show eth0

# Check for errors
ethtool -S eth0 | grep -i error

Routing:

# Show routing table
ip route show

# Show specific route
ip route get 10.255.12.11

# Show ECMP routes
ip route show | grep "nexthop"

# Show routes for specific prefix
ip route show 10.255.0.0/16

Connectivity:

# Ping with specific source
ping -I 10.255.11.11 10.255.12.11

# Traceroute
traceroute -s 10.255.11.11 10.255.12.11

# TCP connectivity test
nc -v -z 10.255.12.11 22

# UDP connectivity (for GENEVE port 6081)
nc -v -u -z 10.255.12.11 6081

Performance:

# Network throughput test
iperf3 -c 10.255.12.11 -P 10

# Monitor traffic
iftop -i eth0

# Check packet counters
watch -n1 'ip -s link show eth0 | grep -A1 "RX:"'

10.6.4 Debugging Tools

Packet capture:

# Capture on physical interface
tcpdump -i eth0 -n

# Capture GENEVE packets
tcpdump -i eth0 -n 'udp port 6081'

# Capture BGP packets
tcpdump -i eth0 -n 'tcp port 179'

# Capture and write to file
tcpdump -i eth0 -w capture.pcap

# Capture with specific filters
tcpdump -i eth0 -n 'host 10.255.12.11'

OVS flow analysis:

# Show flows
ovs-ofctl dump-flows br-int

# Show specific flow
ovs-ofctl dump-flows br-int | grep <pattern>

# Monitor flows
watch -n1 'ovs-ofctl dump-flows br-int | grep <pattern>'

System resources:

# CPU usage
top

# Memory usage
free -h

# Network I/O
iostat -x 1

# Process monitoring
ps aux | grep ovn
ps aux | grep frr

10.7 Capacity Planning & Scaling

10.7.1 Bandwidth Calculations

Per host: - Normal: 40-60% per NIC (80-120G total) - Peak: Up to 200G aggregate - Failover: 100% on single NIC (must plan for this)

Per fabric: - Normal: 40-60% utilization - Failover: 100% capacity required

Per ToR: - Must handle 100% of rack traffic during failover - Rack with 25 hosts: 25 × 100G = 2.5Tbps per fabric

Per Spine: - Must handle sum of all ToR uplinks in fabric - Plan for N-1 spine failures

10.7.2 Sizing for Failover

Critical principle: Size each fabric for 100% load during failover.

Don’t assume 50/50 split. Plan for complete fabric failure: - Fabric-A fails → Fabric-B carries 100% - Fabric-B fails → Fabric-A carries 100%

Oversubscription planning: - ToR downlinks: 25 hosts × 100G = 2.5Tbps - ToR uplinks: Size based on expected traffic patterns - Typical: 4:1 to 8:1 oversubscription at ToR - No oversubscription at spine (non-blocking fabric)

10.7.3 Adding Capacity

When to add capacity: - Fabric utilization > 60% during normal operation - Spine uplinks approaching saturation - Host count approaching ToR port limits

Capacity expansion options: 1. Add more spines: Increase ECMP fanout 2. Upgrade ToR uplinks: 100G → 400G 3. Add new Network Pod: Scale horizontally 4. Upgrade NIC speeds: 100G → 400G per host

10.7.4 Migration to Super-Spine

When to migrate: - Scaling beyond 10-15 racks - Need for network segmentation - Geographic distribution

Procedure: 1. Deploy super-spine switches 2. Create Network Pods (group existing racks) 3. Connect pod spines to super-spines 4. Configure BGP between spines and super-spines 5. Adjust IP addressing to hierarchical scheme 6. Migrate incrementally pod-by-pod

10.8 Health Checks

10.8.1 Daily Health Check Script

#!/bin/bash
# Daily health check for OpenStack DC Network

echo "=== BGP Health Check ==="
vtysh -c "show ip bgp summary" | grep -E "Established|Active|Connect"

echo "=== ECMP Routes Check ==="
ip route show | grep -c "nexthop"

echo "=== OVN Controller Health ==="
systemctl is-active ovn-controller

echo "=== Interface Status ==="
ip link show eth0 | grep -E "state UP|state DOWN"
ip link show eth1 | grep -E "state UP|state DOWN"

echo "=== Recent Errors ==="
journalctl --since "1 hour ago" | grep -i "error\|fail" | tail -10

10.8.2 Automated Monitoring

Metrics to monitor: - BGP session state (up/down) - Number of routes learned - Interface status (up/down) - Interface errors and drops - OVN controller status - Bandwidth utilization per interface - ECMP path count - BFD session status

Alerting thresholds: - BGP session down > 1 minute - Routes missing (expected count not met) - Interface errors > 100/hour - Bandwidth > 80% on single fabric - OVN controller down - Both NICs using same path (ECMP broken)

10.9 References

Network Architecture Overview - Design principles
BGP & Routing Configuration - BGP configuration details
Packet Flows & Load Balancing - ECMP and flow analysis
Configuration Examples - Configuration files and scripts
Quick Reference - Quick command lookup

--- title: "Operations & Maintenance" --- ## Overview This chapter provides operational procedures, troubleshooting guides, and maintenance tasks for the OpenStack DC Network. For architecture details, see [Network Architecture Overview](./network-architecture-overview.qmd). > **Note**: For definitions of terms used in this chapter, see the [Glossary](./glossary.qmd). ## Day-to-Day Operations ### Monitoring BGP Sessions **Check BGP session status**: ```bash # Show BGP summary (all neighbors) vtysh -c "show ip bgp summary" # Expected output: All neighbors in "Established" state # State/PfxRcd shows: Established / number of prefixes received ``` **Check specific neighbor**: ```bash # Detailed neighbor information vtysh -c "show ip bgp neighbors 172.16.11.1" # Check routes received from neighbor vtysh -c "show ip bgp neighbors 172.16.1.1 routes" # Check routes advertised to neighbor vtysh -c "show ip bgp neighbors 172.16.1.1 advertised-routes" ``` **Healthy BGP session indicators**: - State: Established - Uptime: Stable (not flapping) - PfxRcd: Expected number of routes - PfxSnt: Advertised routes ### Checking OVN Health **OVN Controller status**: ```bash # Check OVN controller service systemctl status ovn-controller # Check OVN controller logs journalctl -u ovn-controller -n 100 ``` **OVN configuration**: ```bash # Verify OVN configuration ovs-vsctl get open . external-ids # Expected: ovn-encap-ip=<host-loopback>, ovn-encap-type=geneve ``` **OVN topology**: ```bash # Show OVN logical topology ovn-nbctl show # Show OVN southbound database ovn-sbctl show # List chassis (hosts) ovn-sbctl list chassis ``` ### Verifying ECMP Paths **Check routing table for ECMP**: ```bash # View routing table ip route show # Look for ECMP routes (multiple nexthop) ip route show | grep "nexthop" # Example ECMP route: # 10.255.12.11 proto bgp metric 20 # nexthop via 172.16.11.23 dev eth0 weight 1 # nexthop via 172.16.11.151 dev eth1 weight 1 ``` **Check specific destination**: ```bash # Route to specific host ip route get 10.255.12.11 # Shows which path will be used based on 5-tuple hash ``` ### Network Connectivity Tests **Test host-to-host connectivity**: ```bash # Ping remote host TEP ping -I 10.255.11.11 10.255.12.11 # Traceroute to see path traceroute -s 10.255.11.11 10.255.12.11 # Test GENEVE tunnel # (Traffic should go through automatically via OVN/OVS) ``` **Verify MTU**: ```bash # Check interface MTU ip link show eth0 ip link show eth1 # Should be 9000 for GENEVE overhead # GENEVE adds ~38-50 bytes, so underlay needs MTU ≥ 1550 (or 9000 for jumbo frames) ``` ## Troubleshooting Procedures ### BGP Not Establishing **Symptoms**: BGP neighbor stuck in "Active", "Connect", or "Idle" state **Diagnosis**: 1. **Check basic connectivity**: ```bash # Ping neighbor ping 172.16.11.23 # Check if neighbor is reachable ip route get 172.16.11.23 ``` 2. **Verify configuration**: ```bash # Check FRR configuration vtysh -c "show run" # Verify AS numbers vtysh -c "show ip bgp summary" # Check if ebgp-multihop is configured vtysh -c "show run | include ebgp-multihop" ``` 3. **Check firewall rules**: ```bash # BGP uses TCP port 179 sudo iptables -L -n | grep 179 # Allow BGP if needed sudo iptables -A INPUT -p tcp --dport 179 -j ACCEPT ``` 4. **Check FRR logs**: ```bash # View FRR logs tail -f /var/log/frr/frr.log # Or journalctl journalctl -u frr -n 100 ``` **Common fixes**: - Verify ebgp-multihop is configured - Check AS numbers match configuration - Ensure IP forwarding is enabled: `sysctl net.ipv4.ip_forward` - Verify no firewall blocking TCP port 179 ### OVN Tunnels Not Working **Symptoms**: VMs can't communicate across hosts **Diagnosis**: 1. **Verify TEP reachability**: ```bash # Ping remote TEP ping 10.255.12.11 # Should succeed - if not, BGP/routing issue ``` 2. **Check OVN encapsulation config**: ```bash # Verify TEP IP ovs-vsctl get open . external-ids:ovn-encap-ip # Verify encapsulation type ovs-vsctl get open . external-ids:ovn-encap-type # Should be: geneve ``` 3. **Verify OVN central connectivity**: ```bash # Check if ovn-controller can reach OVN databases ovs-vsctl get open . external-ids:ovn-remote # Check ovn-controller logs journalctl -u ovn-controller -n 100 ``` 4. **Check MTU**: ```bash # GENEVE adds overhead # Underlay MTU should be ≥ overlay MTU + 50 bytes ip link show eth0 | grep mtu # Should be 9000 (or at least 1550) ``` 5. **Verify GENEVE tunnels**: ```bash # Show OVS tunnels ovs-vsctl show | grep genev # Show tunnel ports ovs-ofctl show br-int | grep genev ``` **Common fixes**: - Ensure TEP IP is reachable via BGP - Set MTU to 9000 on physical interfaces - Verify OVN central connectivity - Check ovn-controller service is running ### Routes Not Propagating **Symptoms**: Host /32 not visible in BGP or routing table **Diagnosis**: 1. **Check BGP advertisement**: ```bash # Check what's being advertised vtysh -c "show ip bgp neighbors 172.16.1.1 advertised-routes" # Should see host loopback /32 ``` 2. **Verify route-map and prefix-list**: ```bash # Show route-map vtysh -c "show route-map" # Show prefix-list vtysh -c "show ip prefix-list" # Verify loopback is in prefix-list ``` 3. **Check network statement**: ```bash # Verify network is configured vtysh -c "show run | include network" # Should see: network 10.255.11.11/32 ``` 4. **Check next-hop reachability**: ```bash # Next-hop must be reachable ip route get 172.16.11.23 ``` **Common fixes**: - Add `network <loopback>/32` statement - Verify route-map permits the prefix - Check prefix-list includes loopback - Ensure next-hop is reachable ### Performance Issues **Symptoms**: Low bandwidth, high latency, packet loss **Diagnosis**: 1. **Check ECMP distribution**: ```bash # Verify ECMP is active ip route show | grep "nexthop" # Check if both paths are being used # (Use traffic monitoring tools) ``` 2. **Monitor NIC utilization**: ```bash # Check interface stats ip -s link show eth0 ip -s link show eth1 # Look for errors, drops ``` 3. **Verify GENEVE offload**: ```bash # Check OVS hardware offload ovs-vsctl get Open_vSwitch . other_config:hw-offload # Should be "true" for ConnectX-6 DX ``` 4. **Check for congestion**: ```bash # Monitor queue depth tc -s qdisc show dev eth0 # Check for packet drops netstat -s | grep -i drop ``` **Common fixes**: - Enable hardware offload: `ovs-vsctl set Open_vSwitch . other_config:hw-offload=true` - Verify ECMP maximum-paths is configured - Check for single-path failures (BGP session down) - Monitor for capacity issues ### MTU Problems **Symptoms**: Large packets fail, connectivity works for small packets **Diagnosis**: 1. **Check MTU end-to-end**: ```bash # Physical interfaces ip link show eth0 | grep mtu ip link show eth1 | grep mtu # Should be 9000 # Test with ping ping -M do -s 8972 10.255.12.11 # Should succeed (9000 - 28 bytes for IP/ICMP headers) ``` 2. **Verify GENEVE MTU**: ```bash # GENEVE overhead is ~38-50 bytes # If overlay MTU is 1500, underlay needs ≥ 1550 # If overlay MTU is 9000, underlay needs ≥ 9050 (use 9000) ``` **Common fixes**: - Set underlay MTU to 9000: `ip link set eth0 mtu 9000` - Verify all switches support jumbo frames - Check end-to-end MTU path ## Maintenance Procedures ### Upgrading a Single Fabric (Rolling Upgrade) **Procedure to upgrade Fabric-A (zero downtime)**: 1. **Verify both fabrics healthy**: ```bash # Check BGP on all hosts vtysh -c "show ip bgp summary" # Ensure both Network-A and Network-B paths exist ``` 2. **Upgrade Spine-A switches one at a time**: ```bash # For each Spine-A switch: # 1. Verify ECMP has alternate spines # 2. Upgrade switch OS/config # 3. Reboot switch # 4. Verify BGP sessions re-establish # 5. Wait for routes to stabilize # 6. Move to next spine ``` 3. **Upgrade ToR-A switches one rack at a time**: ```bash # For each ToR-A switch: # 1. Verify hosts have Network-B path # 2. Upgrade ToR-A # 3. Reboot ToR-A # 4. Verify host BGP sessions re-establish # 5. Verify host /32s are advertised # 6. Move to next rack ``` 4. **Verify traffic distribution**: ```bash # After Fabric-A upgrade: # Traffic should resume using both fabrics ip route show | grep "nexthop" ``` 5. **Repeat for Fabric-B**: - Same procedure for Fabric-B switches - Fabric-A now carries 100% load during upgrade **Key**: Each fabric can be upgraded independently with zero impact on the other. ### Adding New Racks **Procedure**: 1. **Physical installation**: - Install ToR-A and ToR-B switches - Cable hosts to both ToRs - Cable ToRs to spines (or other ToRs in mesh) 2. **Allocate IP addresses**: ```bash # Determine next rack number (e.g., Rack 7) # Allocate IP ranges: # - Host loopbacks: 10.255.17.0/24 # - Host↔ToR links: 172.16.7.0/24 # - ToR loopbacks: 10.254.7.1/32 (ToR-A), 10.254.7.2/32 (ToR-B) ``` 3. **Configure ToR switches**: - Configure BGP peers (to hosts and spines) - Set loopback IPs - Configure point-to-point links 4. **Configure hosts**: - Run network setup scripts - Configure FRR/BGP - Configure OVN 5. **Verify BGP**: ```bash # On new hosts vtysh -c "show ip bgp summary" # On ToRs # Verify host /32s are learned and advertised ``` 6. **Verify connectivity**: ```bash # From new host, ping existing host ping -I 10.255.17.11 10.255.11.11 ``` ### Replacing ToR Switches **Procedure to replace ToR-A in Rack 1**: 1. **Pre-check**: ```bash # Verify all hosts have path via ToR-B # On each host in Rack 1: vtysh -c "show ip route 10.255.0.0/16" | grep eth1 # Traffic will use Network-B during replacement ``` 2. **Shut down ToR-A gracefully**: ```bash # On ToR-A: # Shut down BGP to drain traffic gracefully vtysh -c "conf t" -c "router bgp 65101" -c "shutdown" # Wait for BGP to withdraw routes # Monitor: Traffic shifts to Network-B ``` 3. **Physical replacement**: - Power down old ToR-A - Install new ToR-A - Verify cabling 4. **Configure new ToR-A**: - Apply configuration (same as old ToR-A) - Set loopback IP: `10.254.1.1/32` - Configure BGP peers 5. **Bring up BGP**: ```bash # On new ToR-A: # BGP should auto-establish with hosts and spines vtysh -c "show ip bgp summary" ``` 6. **Verify**: ```bash # On hosts in Rack 1: # Verify ECMP routes re-appear ip route show 10.255.0.0/16 | grep "nexthop" # Should see both eth0 and eth1 paths ``` ### Replacing Spine Switches **Procedure to replace Spine-1 (in Fabric-A)**: 1. **Pre-check**: ```bash # Verify alternate spines exist in Fabric-A # Verify Fabric-B is healthy (will carry more load) ``` 2. **Graceful shutdown**: ```bash # On Spine-1: vtysh -c "conf t" -c "router bgp 65010" -c "shutdown" # BGP withdraws routes, ECMP redistributes to other spines ``` 3. **Physical replacement**: - Power down old Spine-1 - Install new Spine-1 - Verify cabling to all ToRs 4. **Configure new Spine-1**: - Apply configuration - Set loopback IP: `10.255.0.1/32` - Configure BGP peers to all ToR-A switches 5. **Verify**: ```bash # On Spine-1: vtysh -c "show ip bgp summary" # Should see all ToR-A switches # Should learn all host /32s ``` ### Decommissioning Hosts **Procedure**: 1. **Drain workloads**: - Migrate VMs to other hosts (OpenStack live migration) - Drain Kubernetes pods (kubectl drain) 2. **Shut down BGP**: ```bash # On host being decommissioned: vtysh -c "conf t" -c "router bgp 66111" -c "shutdown" # Loopback /32 withdrawn from fabric ``` 3. **Verify routes withdrawn**: ```bash # On ToRs: vtysh -c "show ip bgp" | grep 10.255.11.11 # Should not appear ``` 4. **Power down**: - Stop OVN controller - Power down host - Remove from inventory ## Failure Scenarios & Recovery ### Single ToR Failure **Scenario**: ToR-A in Rack 1 fails completely **What happens**: 1. BFD detects link failures within 100-300ms 2. BGP sessions drop to ToR-A 3. BGP withdraws Network-A paths 4. ECMP automatically uses only Network-B paths 5. All traffic routes via eth1 → ToR-B **Impact**: - Hosts in Rack 1 lose Network-A path - Network-B carries 100% load - No packet loss (if BGP/BFD configured correctly) **Recovery**: 1. **Immediate**: Traffic automatically shifts to ToR-B 2. **Replace ToR-A**: Follow "Replacing ToR Switches" procedure 3. **Verify**: ECMP paths restore after replacement **Monitoring**: ```bash # On affected hosts: ip route show | grep "nexthop" # Should see only eth1 path during failure # Should see both paths after recovery ``` ### Single Spine Failure **Scenario**: Spine-1 (Fabric-A) fails **What happens**: 1. BGP sessions drop between Spine-1 and all ToR-A switches 2. ToRs withdraw routes via Spine-1 3. ECMP redistributes across remaining Spine-A switches 4. Fabric-B continues normally **Impact**: - ECMP fanout reduces in Fabric-A - Remaining Spine-A switches handle more load - Fabric-B unaffected **Recovery**: - Replace spine following "Replacing Spine Switches" procedure - ECMP automatically redistributes after replacement ### Complete Fabric-A Failure **Scenario**: All of Fabric-A fails (ToR-A and Spine-A switches) **What happens**: 1. All Network-A BGP sessions drop 2. All Network-A routes withdrawn 3. ECMP uses only Network-B paths 4. 100% traffic on Fabric-B **Impact**: - All hosts lose Network-A connectivity - Fabric-B must handle 100% load - No packet loss if Fabric-B sized for 100% capacity **Recovery**: - Diagnose root cause (power, configuration, hardware) - Restore Fabric-A switches - BGP sessions re-establish automatically - ECMP restores both paths **Critical**: This is why each fabric must be sized for 100% load. ### Host NIC Failure **Scenario**: eth0 fails on a host **What happens**: 1. BFD detects link failure 2. BGP session to ToR-A drops 3. BGP withdraws Network-A path for this host 4. Traffic to this host uses only Network-B path **Impact**: - Host loses Network-A connectivity - All traffic via eth1 - Bandwidth reduced to 100G (from 200G) **Recovery**: ```bash # Diagnose NIC issue ip link show eth0 # Check for hardware errors ethtool -S eth0 | grep -i error # Replace NIC if needed # BGP re-establishes automatically after replacement ``` ### Power Domain Failure **Scenario**: PDU-A fails, affecting all Fabric-A switches **What happens**: - Same as "Complete Fabric-A Failure" - All Fabric-A switches lose power - Fabric-B handles 100% load **Recovery**: - Restore power to PDU-A - Switches boot up - BGP sessions re-establish - ECMP restores **Prevention**: This is why power domain separation is critical (see [Network Architecture Overview](./network-architecture-overview.qmd#power-domain-separation)). ## Command Reference ### FRR/BGP Commands **Status and information**: ```bash # BGP summary vtysh -c "show ip bgp summary" # All BGP routes vtysh -c "show ip bgp" # Specific route vtysh -c "show ip bgp 10.255.11.11/32" # Neighbor details vtysh -c "show ip bgp neighbors 172.16.11.1" # Advertised routes vtysh -c "show ip bgp neighbors 172.16.1.1 advertised-routes" # Received routes vtysh -c "show ip bgp neighbors 172.16.1.1 routes" # Routing table vtysh -c "show ip route" # BFD peers vtysh -c "show bfd peers" ``` **Configuration**: ```bash # Enter FRR shell vtysh # Enter config mode conf t # Show running config show run # Save config write memory ``` ### OVN/OVS Commands **OVS status**: ```bash # Show OVS configuration ovs-vsctl show # Show bridges ovs-vsctl list-br # Show ports on bridge ovs-vsctl list-ports br-int # Show interfaces ovs-vsctl list interface # Show OVS configuration ovs-vsctl get open . external-ids ``` **OVN status**: ```bash # Show OVN logical topology (northbound) ovn-nbctl show # Show OVN southbound database ovn-sbctl show # List chassis ovn-sbctl list chassis # Show chassis details ovn-sbctl show <chassis-name> # Check OVN controller systemctl status ovn-controller journalctl -u ovn-controller -n 100 ``` **OVN configuration**: ```bash # Set TEP IP ovs-vsctl set open . external-ids:ovn-encap-ip=10.255.11.11 # Set encapsulation type ovs-vsctl set open . external-ids:ovn-encap-type=geneve # Set OVN remote (central database) ovs-vsctl set open . external-ids:ovn-remote=tcp:10.254.0.100:6642 ``` ### Network Verification Commands **Interface status**: ```bash # Show all interfaces ip addr show # Show specific interface ip link show eth0 # Show interface statistics ip -s link show eth0 # Check for errors ethtool -S eth0 | grep -i error ``` **Routing**: ```bash # Show routing table ip route show # Show specific route ip route get 10.255.12.11 # Show ECMP routes ip route show | grep "nexthop" # Show routes for specific prefix ip route show 10.255.0.0/16 ``` **Connectivity**: ```bash # Ping with specific source ping -I 10.255.11.11 10.255.12.11 # Traceroute traceroute -s 10.255.11.11 10.255.12.11 # TCP connectivity test nc -v -z 10.255.12.11 22 # UDP connectivity (for GENEVE port 6081) nc -v -u -z 10.255.12.11 6081 ``` **Performance**: ```bash # Network throughput test iperf3 -c 10.255.12.11 -P 10 # Monitor traffic iftop -i eth0 # Check packet counters watch -n1 'ip -s link show eth0 | grep -A1 "RX:"' ``` ### Debugging Tools **Packet capture**: ```bash # Capture on physical interface tcpdump -i eth0 -n # Capture GENEVE packets tcpdump -i eth0 -n 'udp port 6081' # Capture BGP packets tcpdump -i eth0 -n 'tcp port 179' # Capture and write to file tcpdump -i eth0 -w capture.pcap # Capture with specific filters tcpdump -i eth0 -n 'host 10.255.12.11' ``` **OVS flow analysis**: ```bash # Show flows ovs-ofctl dump-flows br-int # Show specific flow ovs-ofctl dump-flows br-int | grep <pattern> # Monitor flows watch -n1 'ovs-ofctl dump-flows br-int | grep <pattern>' ``` **System resources**: ```bash # CPU usage top # Memory usage free -h # Network I/O iostat -x 1 # Process monitoring ps aux | grep ovn ps aux | grep frr ``` ## Capacity Planning & Scaling ### Bandwidth Calculations **Per host**: - Normal: 40-60% per NIC (80-120G total) - Peak: Up to 200G aggregate - Failover: 100% on single NIC (must plan for this) **Per fabric**: - Normal: 40-60% utilization - Failover: 100% capacity required **Per ToR**: - Must handle 100% of rack traffic during failover - Rack with 25 hosts: 25 × 100G = 2.5Tbps per fabric **Per Spine**: - Must handle sum of all ToR uplinks in fabric - Plan for N-1 spine failures ### Sizing for Failover **Critical principle**: **Size each fabric for 100% load during failover**. Don't assume 50/50 split. Plan for complete fabric failure: - Fabric-A fails → Fabric-B carries 100% - Fabric-B fails → Fabric-A carries 100% **Oversubscription planning**: - ToR downlinks: 25 hosts × 100G = 2.5Tbps - ToR uplinks: Size based on expected traffic patterns - Typical: 4:1 to 8:1 oversubscription at ToR - No oversubscription at spine (non-blocking fabric) ### Adding Capacity **When to add capacity**: - Fabric utilization > 60% during normal operation - Spine uplinks approaching saturation - Host count approaching ToR port limits **Capacity expansion options**: 1. **Add more spines**: Increase ECMP fanout 2. **Upgrade ToR uplinks**: 100G → 400G 3. **Add new Network Pod**: Scale horizontally 4. **Upgrade NIC speeds**: 100G → 400G per host ### Migration to Super-Spine **When to migrate**: - Scaling beyond 10-15 racks - Need for network segmentation - Geographic distribution **Procedure**: 1. Deploy super-spine switches 2. Create Network Pods (group existing racks) 3. Connect pod spines to super-spines 4. Configure BGP between spines and super-spines 5. Adjust IP addressing to hierarchical scheme 6. Migrate incrementally pod-by-pod ## Health Checks ### Daily Health Check Script ```bash #!/bin/bash # Daily health check for OpenStack DC Network echo "=== BGP Health Check ===" vtysh -c "show ip bgp summary" | grep -E "Established|Active|Connect" echo "=== ECMP Routes Check ===" ip route show | grep -c "nexthop" echo "=== OVN Controller Health ===" systemctl is-active ovn-controller echo "=== Interface Status ===" ip link show eth0 | grep -E "state UP|state DOWN" ip link show eth1 | grep -E "state UP|state DOWN" echo "=== Recent Errors ===" journalctl --since "1 hour ago" | grep -i "error\|fail" | tail -10 ``` ### Automated Monitoring **Metrics to monitor**: - BGP session state (up/down) - Number of routes learned - Interface status (up/down) - Interface errors and drops - OVN controller status - Bandwidth utilization per interface - ECMP path count - BFD session status **Alerting thresholds**: - BGP session down > 1 minute - Routes missing (expected count not met) - Interface errors > 100/hour - Bandwidth > 80% on single fabric - OVN controller down - Both NICs using same path (ECMP broken) ## References - [Network Architecture Overview](./network-architecture-overview.qmd) - Design principles - [BGP & Routing Configuration](./bgp-routing.qmd) - BGP configuration details - [Packet Flows & Load Balancing](./packet-flows-ecmp.qmd) - ECMP and flow analysis - [Configuration Examples](../configs/README.qmd) - Configuration files and scripts - [Quick Reference](./quick-reference.qmd) - Quick command lookup