Cuckoo Switch

Page content

Cuckoo 알고리즘을 사용하여 Flow lookup과 flow update 성능을 높힌 것과 DPDK를 이용하여 패킷 처리 성능을 높힌 것

출처 : Scalable, High Performance Ethernet Forwarding with CuckooSwitch

DPDK

  • DPDK를 이용한 IO 성능 개선한 것 외에 특이한 것은 없음.

Cuckoo hashing

  • 대개 FIB update를 위해 RCU(Read Copy Update)를 사용함. 이 경우 완전한 정보를 갖는 additional entry가 필요

    • 수정된 cuckoo algorithm을 기반으로 한 flow table 사용
  • Basic Cuckoo hashing

    • ensures 50% table space utilization
  • 4-way associative hash table has 95% utilization

  • 메모리 사용량 감소 및 multiple reader와 single writer가 동시에 접근할 수 있게 하여 lock 사용없이 FIB update 성능 개선

  • Compiler reorerding barriers

    • ‘a sequence of memory writes at one core are guaranteed to appear in the same order at all remote CPUs’
    • If one cores write W1,W2 and W3 and the R1 from some core observes the effect of W3, then the R2 and R3 which is issued after R1 also observed the same effect
    • Check the words “compiler reordering barriers”
    • ensure the compiler does not reorder the store and read instructions using a compiler reordering barrier between the read operations which should keep the instruction order
    • __asm__ __volatile__("" ::: "memory") in gcc
    • mark volatile for the fields to access
  • Batch Hash table lookup

    • If the size of the hash table cannot fit the fast CPU cache size, then performance drops dramatically.
    • For this, batching 16 packets for one lookup
    • Combine all the packets in the buffers as a single batch and perform hash table lookup at the same time
    • FIXME - How to batch lookups
  • Prefetch

    • FIXME

Performance

  • 10^9 entries
  • 350M lookups/second
  • 0.5M updates/second

etc.