Google's Load Balancer
Page content
Maglev
- Google( https://research.google.com/pubs/pub44824.html )
- Used in Google Cloud since 2008
- Scalable load balancer
- Consistent hashing
- Connection Tracking
- Scale-out model backed by router’s ECMP
- Bypass kernel space for performance.
- Support connection persistence
Network Architecture
- DNS - Routers - Maglevs - Service EndPoints.
- One service is served by one or more VIPs
- DNS returns VIP considering geolocation and load of location
- One VIP is served by multiple Maglevs
- Router use ECMP to select one Maglev
- One VIP is mapped to multiple Service EndPoints
- Maglev select Service EndPoint by seletion algorithm and connection tracking table
- Maglev use GRE to send incoming packet to Service EndPoint or another Maglev
- Send to IP fragment to another special Maglev servers
- Use only 3-tuple for IP fragment
- Each Service EndPoint use Direct Server Return(DSR)
Maglev
Controller
- Responsible for VIP announcement with BGP
- Check health status of forwarder
- If forwarder is not headthy, withdraw all VIP announcements
Forwarder
- Each VIP has one or multiple backend pools(BP)
- BP contain physical IP address of the Service EndPoint
- Each BP has specific health checking methods - depends on the service requirement(just reachability or more)
- Config Manager parse and update configuration of forwarder’s behavior based on the Config Objects
- Sharding
- Sharding of Maglev enables service isolation - new service or QoS
Backend Selection
- Consistent Hashing distribute loads
- Record selection in LOCAL connection tracking table
- Connection tracking table is not shared with another Maglev
- Does not guarantee consistency on Maglev or Service EndPoint Changes(add/delete)
- For different traffic type
- TCP SYN : select Backend and record it in connection tracking table
- TCP non-SYN : lookup connection tracking table
- 5-tuple : (maybe) lookup connection tracking table and select backend if not found
Consistent Hashing
- If Maglev is added or removed, router select different Maglev for the exsiting session - ECMP is changed
- If one Maglev’s local connection tracking table is overflowed, it will lose previous selection
- To resolve this issues,
- Synchronize local connection tracking table between Maglevs -> overhead, overhead, overhead
- Consistent hashing for minimize disruption in member changes
- Maglev hashing - load balancing and minimal disruption on member changes