Questions ========= Here I keep track of some questions I really wanted to answer during my studies on CUDA C. #. If CUDA C manages the distribution to the threads and blocks, what are the implications of using different block and thread sizes?:: ??? #. If the shared memory latency is `\approx 100 \times` lower than uncached global memory latency, how to make the access to the array more cache friendly?:: ???