Questions¶

Here I keep track of some questions I really wanted to answer during my studies on CUDA C.

If CUDA C manages the distribution to the threads and blocks, what are the implications of using different block and thread sizes?:
```
???
```
If the shared memory latency is $\approx 100 \times$ lower than uncached global memory latency, how to make the access to the array more cache friendly?:
```
???
```