19 Google Jupiter Comparison

19.1 Overview

This appendix compares our underlay approach (L3 Clos fabric using eBGP + ECMP + BFD, with OVN/GENEVE overlay handled by the hosts) with Google’s Jupiter network (as described in their 2015 paper and their 2022 evolution write-up).

This analysis is to understand what’s conceptually the same, what’s different, and what we can learn.

19.2 What is the same (the big ideas)

At a high level, Jupiter and our design agree on the “shape” of a good datacenter network:

Clos / leaf–spine thinking: build the network from many smaller switches, with lots of parallel paths.
Multipath as the default: rely on having many equal-ish routes, so failures are handled by routing around them.
Scale-out mindset: add racks/switches and grow capacity incrementally rather than redesigning the entire fabric.
Operate the network as a system: strong automation and consistency matter more than clever per-device configuration.

In other words: the core idea is the same — many paths + repeatable building blocks.

19.3 What is different (and why that’s OK)

1. Control plane: Google built a custom, centralized system; we use standard BGP

Google’s earlier Jupiter design leaned heavily on centralized control for routing decisions inside the fabric, whereas we are intentionally using a distributed approach:

Google (2015): centralized control plane that can compute and program routing behavior consistently at huge scale.
Ours: distributed eBGP everywhere (host↔︎ToR, ToR↔︎spine, spine↔︎super-spine), with ECMP for multipath and BFD for fast failure detection.

This is mostly a “scale + operations” decision:

At hyperscale, Google can justify custom systems and dedicated teams.
At our scale, standard protocols are simpler to run, easier to hire for, and easier to debug.

2. Edge model (ToR ↔︎ hosts): Google described L2 under a ToR; we do L3 to host uplinks

In the 2015 Jupiter paper, the fabric routes up to the ToR, and hosts below a ToR can still be part of an L2 domain (broadcast domain).

Ours is “L3-to-the-edge” and no L2 aggregation at the ToR to host:

each host NIC is a routed point-to-point link
the underlay has no L2 broadcast domain
any tenant “L2-like” behavior is handled in OVN’s overlay (GENEVE), not by the physical switches

This keeps the physical network simpler and keeps virtualization semantics (tenants, security groups, logical switches) in one place: OVN.

3. Evolution (2022): Google adds SDN + optical switching + traffic engineering; we keep ECMP

Google’s newer Jupiter evolution emphasizes incremental upgrades and heterogeneous link speeds using optical circuit switching and software-driven topology/traffic engineering.

Our plan is intentionally simpler: - we rely on ECMP to spread load statistically - we scale by adding spines, then adding super-spines and new Network Pods (NP) when needed - we optimize for predictable operations rather than application-aware traffic engineering

19.4 Why Central Control Plane and not Distributed BGP?

Distributed BGP at datacenter scale was less proven in 2015, when Google built Jupiter. They needed absolute consistency and predictability at hyperscale (100K+ servers), so they built a centralized SDN control plane.

Today’s context is different: modern FRR/BGP stacks are mature, ECMP + BFD is battle-tested, and distributed routing is the industry default for medium-scale datacenters. We benefit from this maturity without needing custom control planes.

19.5 Terminology mapping (Google vs our book)

Google term	Rough equivalent in our book
ToR (Top of Rack)	ToR
Aggregation block	Pod spine layer
Spine block	Super-spine
Block	Network Pod (NP)

Note: Google often uses “block” to mean a group of switches treated as a unit. We use “Network Pod (NP)” for the same idea.

19.6 What we can learn from Jupiter

Even though we won’t replicate Jupiter’s custom control plane, there are several takeaways that translate directly:

Design in repeatable units (pods/blocks): our “Network Pod (NP)” approach is the right direction.
Invest in automation + verification: treat configuration as code, validate adjacency, validate routes, validate ECMP behavior.
Make upgrades boring: aim for small, reversible changes and clearly defined failure domains.
Observability matters: monitor BGP/BFD health, link utilization, and overlay tunnel utilization so we can spot imbalances early.

19.7 Summary

Topic	Google Jupiter (high level)	Our design (high level)
Physical shape	Clos with many parallel paths	Clos with many parallel paths
Control plane	Centralized + custom systems (esp. 2015)	Distributed eBGP + ECMP + BFD
ToR-to-host model	Described with L2 under ToR (2015)	L3 to host uplinks (p2p)

19.8 References

Jupiter Rising (SIGCOMM 2015): https://dl.acm.org/doi/pdf/10.1145/2829988.2787508
The evolution of Google’s Jupiter data center network (Google Cloud Blog, 2022): https://cloud.google.com/blog/topics/systems/the-evolution-of-googles-jupiter-data-center-network

--- title: "Google Jupiter Comparison" --- ## Overview This appendix compares our underlay approach (L3 Clos fabric using eBGP + ECMP + BFD, with OVN/GENEVE overlay handled by the hosts) with Google's Jupiter network (as described in their 2015 paper and their 2022 evolution write-up). This analysis is to understand what's conceptually the same, what's different, and what we can learn. ## What is the same (the big ideas) At a high level, Jupiter and our design agree on the "shape" of a good datacenter network: - **Clos / leaf–spine thinking:** build the network from many smaller switches, with lots of parallel paths. - **Multipath as the default:** rely on having many equal-ish routes, so failures are handled by routing around them. - **Scale-out mindset:** add racks/switches and grow capacity incrementally rather than redesigning the entire fabric. - **Operate the network as a system:** strong automation and consistency matter more than clever per-device configuration. In other words: the core idea is the same — many paths + repeatable building blocks. ## What is different (and why that's OK) **1. Control plane: Google built a custom, centralized system; we use standard BGP** Google's earlier Jupiter design leaned heavily on centralized control for routing decisions inside the fabric, whereas we are intentionally using a distributed approach: - **Google (2015):** centralized control plane that can compute and program routing behavior consistently at huge scale. - **Ours:** distributed **eBGP everywhere** (host↔ToR, ToR↔spine, spine↔super-spine), with **ECMP** for multipath and **BFD** for fast failure detection. This is mostly a "scale + operations" decision: - At hyperscale, Google can justify custom systems and dedicated teams. - At our scale, standard protocols are simpler to run, easier to hire for, and easier to debug. **2. Edge model (ToR ↔ hosts): Google described L2 under a ToR; we do L3 to host uplinks** In the 2015 Jupiter paper, the fabric routes up to the ToR, and hosts below a ToR can still be part of an L2 domain (broadcast domain). **Ours** is "L3-to-the-edge" and no L2 aggregation at the ToR to host: - each host NIC is a routed point-to-point link - the underlay has no L2 broadcast domain - any tenant "L2-like" behavior is handled in OVN's overlay (GENEVE), not by the physical switches This keeps the physical network simpler and keeps virtualization semantics (tenants, security groups, logical switches) in one place: OVN. **3. Evolution (2022): Google adds SDN + optical switching + traffic engineering; we keep ECMP** Google's newer Jupiter evolution emphasizes incremental upgrades and heterogeneous link speeds using optical circuit switching and software-driven topology/traffic engineering. **Our** plan is intentionally simpler: - we rely on ECMP to spread load statistically - we scale by adding spines, then adding super-spines and new Network Pods (NP) when needed - we optimize for predictable operations rather than application-aware traffic engineering ## Why Central Control Plane and not Distributed BGP? Distributed BGP at datacenter scale was less proven in 2015, when Google built Jupiter. They needed absolute consistency and predictability at hyperscale (100K+ servers), so they built a centralized SDN control plane. **Today's context is different: modern FRR/BGP stacks are mature, ECMP + BFD is battle-tested**, and distributed routing is the industry default for medium-scale datacenters. We benefit from this maturity without needing custom control planes. ## Terminology mapping (Google vs our book) | Google term | Rough equivalent in our book | |---|---|---| | ToR (Top of Rack) | ToR | | Aggregation block | Pod spine layer | | Spine block | Super-spine | | Block | Network Pod (NP) | Note: Google often uses "block" to mean a group of switches treated as a unit. We use "Network Pod (NP)" for the same idea. ## What we can learn from Jupiter Even though we won't replicate Jupiter's custom control plane, there are several takeaways that translate directly: 1. **Design in repeatable units (pods/blocks):** our "Network Pod (NP)" approach is the right direction. 2. **Invest in automation + verification:** treat configuration as code, validate adjacency, validate routes, validate ECMP behavior. 3. **Make upgrades boring:** aim for small, reversible changes and clearly defined failure domains. 4. **Observability matters:** monitor BGP/BFD health, link utilization, and overlay tunnel utilization so we can spot imbalances early. ## Summary | Topic | Google Jupiter (high level) | Our design (high level) | |---|---|---| | Physical shape | Clos with many parallel paths | Clos with many parallel paths | | Control plane | Centralized + custom systems (esp. 2015) | Distributed eBGP + ECMP + BFD | | ToR-to-host model | Described with L2 under ToR (2015) | L3 to host uplinks (p2p) | ## References - Jupiter Rising (SIGCOMM 2015): https://dl.acm.org/doi/pdf/10.1145/2829988.2787508 - The evolution of Google's Jupiter data center network (Google Cloud Blog, 2022): https://cloud.google.com/blog/topics/systems/the-evolution-of-googles-jupiter-data-center-network