Cuckoo Switch
Page content
Cuckoo 알고리즘을 사용하여 Flow lookup과 flow update 성능을 높힌 것과 DPDK를 이용하여 패킷 처리 성능을 높힌 것
출처 : Scalable, High Performance Ethernet Forwarding with CuckooSwitch
DPDK
- DPDK를 이용한 IO 성능 개선한 것 외에 특이한 것은 없음.
Cuckoo hashing
대개 FIB update를 위해 RCU(Read Copy Update)를 사용함. 이 경우 완전한 정보를 갖는 additional entry가 필요
- 수정된 cuckoo algorithm을 기반으로 한 flow table 사용
Basic Cuckoo hashing
- ensures 50% table space utilization
4-way associative hash table has 95% utilization
- ‘A cool and practical alternative to traditional hash tables’
- Only two cache-sized parallel memory accesses are required for lookup
- What means ‘4-way’? Each hash bucket has 4 entries??? - FIXME Check the paper
메모리 사용량 감소 및 multiple reader와 single writer가 동시에 접근할 수 있게 하여 lock 사용없이 FIB update 성능 개선
- Compact and concurrent memcache with dumber caching and smarter hashing
- ‘Optimitic concurrent cuckoo hashing’
- More optimization for …. FIXME
Compiler reorerding barriers
- ‘a sequence of memory writes at one core are guaranteed to appear in the same order at all remote CPUs’
- If one cores write W1,W2 and W3 and the R1 from some core observes the effect of W3, then the R2 and R3 which is issued after R1 also observed the same effect
- Check the words “compiler reordering barriers”
- ensure the compiler does not reorder the store and read instructions using a compiler reordering barrier between the read operations which should keep the instruction order
__asm__ __volatile__("" ::: "memory")
in gcc- mark
volatile
for the fields to access
Batch Hash table lookup
- If the size of the hash table cannot fit the fast CPU cache size, then performance drops dramatically.
- For this, batching 16 packets for one lookup
- Combine all the packets in the buffers as a single batch and perform hash table lookup at the same time
- FIXME - How to batch lookups
Prefetch
- FIXME
Performance
- 10^9 entries
- 350M lookups/second
- 0.5M updates/second