BWUtil is computed based on GPU Time, even when Batch GPU is available #194

ahendriksen · 2024-11-28T14:16:30Z

Based on the output of nvbench, it looks like the global memory bandwidth and bandwidth utilization are calculated based on GPU Time.

Below is an excerpt from a recent benchmark. There are two runs:

the first runs a kernel once
the second runs the kernel a 1000 times in a CUDA graph

The Batch GPU Time of the two runs is similar (~10us with a factor 1000 difference), but the GPU Time is different (~15us for the first kernel and ~9us for the second kernel). If the bandwidth was calculated based on the Batch GPU time, we would therefore expect the same bandwidth number to roll out. However, the bandwidth number is off by a factor of 1.5 between the two rows.

| iters_per_graph | Samples | CPU Time  | Noise  | GPU Time  | Noise | GlobalMem BW | BWUtil | Samples | Batch GPU | 
|-----------------|---------|-----------|--------|-----------|-------|--------------|--------|---------|-----------|
|               0 |  33424x | 24.909 us | 66.85% | 14.961 us | 2.95% |   1.963 TB/s | 23.96% |  48703x | 10.267 us |
|            1000 |     54x |  9.343 ms |  0.13% |  9.333 ms | 0.07% |   3.146 TB/s | 38.41% |     56x |  9.310 ms |

It would be more accurate to calculate the bandwidth based on the Batch GPU time when it is available.

The text was updated successfully, but these errors were encountered:

fbusato · 2024-12-02T19:58:49Z

even better if we use the CUPTI profiling API for this purpose https://docs.nvidia.com/cupti/main/main.html#metrics-table

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BWUtil is computed based on GPU Time, even when Batch GPU is available #194

BWUtil is computed based on GPU Time, even when Batch GPU is available #194

ahendriksen commented Nov 28, 2024 •

edited

Loading

fbusato commented Dec 2, 2024

BWUtil is computed based on GPU Time, even when Batch GPU is available #194

BWUtil is computed based on GPU Time, even when Batch GPU is available #194

Comments

ahendriksen commented Nov 28, 2024 • edited Loading

fbusato commented Dec 2, 2024

ahendriksen commented Nov 28, 2024 •

edited

Loading