Google

Google's Load Balancer

Maglev Google( https://research.google.com/pubs/pub44824.html ) Used in Google Cloud since 2008 Scalable load balancer Consistent hashing Connection Tracking Scale-out model backed by router’s ECMP Bypass kernel space for performance. Support connection persistence Network Architecture DNS - Routers - Maglevs - Service EndPoints. One service is served by one or more VIPs DNS returns VIP considering geolocation and load of location One VIP is served by multiple Maglevs Router use ECMP to select one Maglev One VIP is mapped to multiple Service EndPoints Maglev select Service EndPoint by seletion algorithm and connection tracking table Maglev use GRE to send incoming packet to Service EndPoint or another Maglev Send to IP fragment to another special Maglev servers Use only 3-tuple for IP fragment Each Service EndPoint use Direct Server Return(DSR) Maglev Controller Responsible for VIP announcement with BGP Check health status of forwarder If forwarder is not headthy, withdraw all VIP announcements Forwarder Each VIP has one or multiple backend pools(BP) BP contain physical IP address of the Service EndPoint Each BP has specific health checking methods - depends on the service requirement(just reachability or more) Config Manager parse and update configuration of forwarder’s behavior based on the Config Objects Sharding Sharding of Maglev enables service isolation - new service or QoS Backend Selection Consistent Hashing distribute loads Record selection in LOCAL connection tracking table Connection tracking table is not shared with another Maglev Does not guarantee consistency on Maglev or Service EndPoint Changes(add/delete) For different traffic type TCP SYN : select Backend and record it in connection tracking table TCP non-SYN : lookup connection tracking table 5-tuple : (maybe) lookup connection tracking table and select backend if not found Consistent Hashing If Maglev is added or removed, router select different Maglev for the exsiting session - ECMP is changed If one Maglev’s local connection tracking table is overflowed, it will lose previous selection To resolve this issues, Synchronize local connection tracking table between Maglevs -> overhead, overhead, overhead Consistent hashing for minimize disruption in member changes Maglev hashing - load balancing and minimal disruption on member changes reference Maglev: A Fast and Reliable Software Network Load Balancer Consistent Hashing The Simple Magic of Consistent Hashing

Espresso - Google's peering edge architecture

Google Fellow Amin Vahdat, “Early on, we realized that the network we needed to support our services did not exist and could not be bought,” Espresso makes Google cloud faster, more available and cost effective by extending SDN to the public internet network should be treated as a large-scale distributed system leveraging the same control infrastructure we developed for Google’s compute and storage systems Four pillars on Google’s SDN strategy Jupiter: Google employed SDN principles to build Jupiter, a data center interconnect capable of supporting more than 100,000 servers.

Googlegeist vs. SCI

SCI 결과를 개선하기 위해 실질적으로 이뤄지는 노력이 안 보인다는 것. 노력한다해도 그건 관리자와 비관리자가 함께 노력해야 하는 일일텐데(관리자나 회사에 대한 불만이므로 그 불만 개선이 노력이 맞는 방향인지는 당연히 비관리지에게도 한께 논의되어야 한다) 그런 건 보기 어렵다. 문제 제기는 니들이 하지만 문제 해결은 나만 할 수 있다고 착각은 버려야 한다. 직원들이 불만에 대해 공감도 못하는데 어떻게 그 불만을 해결하기 위해 노력할 수 있겠나. 아니 공감을 하지 못하면 이해하기 위해 혹은 설득하기 위해 함께 이야기해야 하는데 그런 노력은 대부분 알아서 하란다.

Culture should be setup first

from Google Work Rules 2006년에 구글에 입사. 72년 생 구글 임직원 나이 평균에 비하면 많지만, 그래도 비슷한 덩치의 국내 기업의 인사 담당자와 비교하면. 하긴 구글을 국내 (대)기업과 비교하는 것 자체가 의미없는 일이지만 과연 현실은 그렇다해도 저런 생각을 가진 사람을 주변에서 볼 수 있을까? 그러기에 현실은 너무 지난하다.