Chapter 2. CUDA Memory Management
2. CUDA Memory Management Most of the application’s performance will be bottlecked by memory-related constraints GPU RAM BW : 900GB/s (DDR3 ?) NV Visual Profiler Global memory is a staging area where all of the date gets copied from CPU memory. Global Memory(device memory) is visible to all of the threads in the kernel and also visible to CPu. Coalesced vs. uncoalesced global memory access coalesced global memory access : Sequential memory access is adjacent Warp Warp is a unit of thread scheduling/execution in SMs.