QuestionsΒΆ

Here I keep track of some questions I really wanted to answer during my studies on CUDA C.

  1. If CUDA C manages the distribution to the threads and blocks, what are the implications of using different block and thread sizes?:

    ???
    
  2. If the shared memory latency is \approx 100 \times lower than uncached global memory latency, how to make the access to the array more cache friendly?:

    ???
    

Previous topic

Naive addition of two vectors

Next topic

References

This Page