-
Notifications
You must be signed in to change notification settings - Fork 70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Check average clock frequency during benchmarks #193
Comments
@ahendriksen, do you happen to know whether we can use those APIs instead of the approach outlined in the code above? |
We cant: According to the API docs |
Yes, but would |
It could. There is one issue that the use of the NVML APIs does not help with and that is checking if something happened over time. They return an instantaneous result. If you run a benchmark for 30 seconds, it doesn't matter what the clock throttle reason is at the end of the benchmark or what the clocks are at the end of the benchmark. It matters what the average clock frequency was during those 30 seconds. If you can additionally get clock throttle reasons, that would be nice, but not necessary. Clock throttle reason you want to know when debugging the hardware. For instance, to determine if the throttling happened due to thermal or power constraints. It doesn't help with debugging software. |
@ahendriksen I always love how well you can specify a problem! Thx. So let's implement the approach in the code above and try whether NVML can give us some useful additional diagnostics. |
@ahendriksen sorry, why |
As you say, it is indeed not impossible. However, it's not a great solution for several reasons:
The proposed solution only requires running a kernel once before and once after the benchmark is done. It is used widely within Nvidia, and it works. Is there a specific reason that we would want to exhaust all other possible options before using anything but the nvml API? |
I'm just trying to understand pros & cons of these approaches.
you can measure clocks before and after as in the custom kernel approach.
The polling interval is quite short, on the order of microseconds. Doing before/after is even worse, both with the custom kernel and with
I have seen the |
The kernel is measuring elapsed clock ticks. NVML is measuring clock frequency.
|
Just to clarify, in the code that Bernhard pasted,
|
Sometimes, benchmark systems are unstable due to external factors and GPUs cannot keep up their clock frequency during a benchmark. This leads to wrong results.
NVBench should monitor the clock frequency during benchmarking and detect such conditions. One way is to query the global timer and SM block before and after the benchmark, and compute the average frequency:
where
f
launches the kernel to benchmark. If the computedclock_rate
is off from the expected value, we should issue a warning.The text was updated successfully, but these errors were encountered: