-
Notifications
You must be signed in to change notification settings - Fork 300
[libcu++] Dynamically load CUDA library instead of using the runtime #6899
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
/ok to test 91bea5b |
95edf73 to
319d34b
Compare
|
/ok to test 319d34b |
This comment has been minimized.
This comment has been minimized.
|
/ok to test 35dd82a |
This comment has been minimized.
This comment has been minimized.
| # if _CCCL_OS(WINDOWS) | ||
| HMODULE m_cudaDriverLibrary = LoadLibraryEx("nvcuda.dll", nullptr, LOAD_LIBRARY_SEARCH_SYSTEM32); | ||
| if (m_cudaDriverLibrary == nullptr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Critical: This is loading every single time the function is called, should this be a function local static?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We only call this function once to initialize a function local static __get_proc_addr_fn in __get_driver_entry_point. I don't think we will ever have to call this function more than once, so I don't think there is a reason to store the loaded library handle. But its also not expensive, so I would be fine to store it as well if there is an argument to do so
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont think we need to store it, just make it
| # if _CCCL_OS(WINDOWS) | |
| HMODULE m_cudaDriverLibrary = LoadLibraryEx("nvcuda.dll", nullptr, LOAD_LIBRARY_SEARCH_SYSTEM32); | |
| if (m_cudaDriverLibrary == nullptr) | |
| # if _CCCL_OS(WINDOWS) | |
| static auto m_cudaDriverLibrary = ::LoadLibraryExA("nvcuda.dll", nullptr, LOAD_LIBRARY_SEARCH_SYSTEM32); | |
| if (m_cudaDriverLibrary == nullptr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this function also be _CCCL_PUBLIC_HOST_API so we really query it only once across TUs?
| # else | ||
| const char* m_cudaDriverLibraryName = "libcuda.so.1"; | ||
| # endif | ||
| void* m_cudaDriverLibrary = dlopen(m_cudaDriverLibraryName, RTLD_NOW); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto should this be function local static?
4055340 to
193aad2
Compare
|
/ok to test 193aad2 |
This comment has been minimized.
This comment has been minimized.
|
/ok to test 39b3355 |
This comment has been minimized.
This comment has been minimized.
|
/ok to test bcb2d5f |
This comment has been minimized.
This comment has been minimized.
b34019d to
4675dfa
Compare
|
/ok to test 4675dfa |
This comment has been minimized.
This comment has been minimized.
4675dfa to
8ed886b
Compare
|
/ok to test 8ed886b |
8ed886b to
8fda00c
Compare
|
/ok to test 8fda00c |
This comment has been minimized.
This comment has been minimized.
🥳 CI Workflow Results🟩 Finished in 7h 36m: Pass: 100%/91 | Total: 2d 06h | Max: 2h 56m | Hits: 92%/195554See results here. |
| # if _CCCL_OS(WINDOWS) | ||
| HMODULE m_cudaDriverLibrary = LoadLibraryEx("nvcuda.dll", nullptr, LOAD_LIBRARY_SEARCH_SYSTEM32); | ||
| if (m_cudaDriverLibrary == nullptr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this function also be _CCCL_PUBLIC_HOST_API so we really query it only once across TUs?
| const char* m_cudaDriverLibraryName = "libcuda.so"; | ||
| # else | ||
| const char* m_cudaDriverLibraryName = "libcuda.so.1"; | ||
| # endif | ||
| void* m_cudaDriverLibrary = ::dlopen(m_cudaDriverLibraryName, RTLD_NOW); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please, use __my_var style
We can't support minor version compatibility in 12.X releases without breaking dynamic runtime use case described here: #5970. Its because in older 12.X releases there is no
cudaGetDriverEntryPointByVersionand the non versioned one can't support MVC.This PR switches to instead dynamically load the CUDA library and fetch
cuGetProcAddressfrom it, instead of usingcudaGetDriverEntryPointOn the windows side it uses
LibraryLoad/GetProcAddress, which requireswindows.hinclude. It's a pretty heavy header, but should also be commonly included anyway. Long term we can think about an alternative