Skip to content

Conversation

@pciolkosz
Copy link
Contributor

@pciolkosz pciolkosz commented Dec 5, 2025

We can't support minor version compatibility in 12.X releases without breaking dynamic runtime use case described here: #5970. Its because in older 12.X releases there is no cudaGetDriverEntryPointByVersion and the non versioned one can't support MVC.
This PR switches to instead dynamically load the CUDA library and fetch cuGetProcAddress from it, instead of using cudaGetDriverEntryPoint
On the windows side it uses LibraryLoad/GetProcAddress, which requires windows.h include. It's a pretty heavy header, but should also be commonly included anyway. Long term we can think about an alternative

@copy-pr-bot
Copy link
Contributor

copy-pr-bot bot commented Dec 5, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@cccl-authenticator-app cccl-authenticator-app bot moved this from Todo to In Progress in CCCL Dec 5, 2025
@pciolkosz
Copy link
Contributor Author

/ok to test 91bea5b

@pciolkosz pciolkosz force-pushed the dlopen_libcuda_so branch 2 times, most recently from 95edf73 to 319d34b Compare December 5, 2025 22:38
@pciolkosz
Copy link
Contributor Author

/ok to test 319d34b

@github-actions

This comment has been minimized.

@pciolkosz
Copy link
Contributor Author

/ok to test 35dd82a

@github-actions

This comment has been minimized.

Comment on lines 56 to 58
# if _CCCL_OS(WINDOWS)
HMODULE m_cudaDriverLibrary = LoadLibraryEx("nvcuda.dll", nullptr, LOAD_LIBRARY_SEARCH_SYSTEM32);
if (m_cudaDriverLibrary == nullptr)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Critical: This is loading every single time the function is called, should this be a function local static?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only call this function once to initialize a function local static __get_proc_addr_fn in __get_driver_entry_point. I don't think we will ever have to call this function more than once, so I don't think there is a reason to store the loaded library handle. But its also not expensive, so I would be fine to store it as well if there is an argument to do so

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont think we need to store it, just make it

Suggested change
# if _CCCL_OS(WINDOWS)
HMODULE m_cudaDriverLibrary = LoadLibraryEx("nvcuda.dll", nullptr, LOAD_LIBRARY_SEARCH_SYSTEM32);
if (m_cudaDriverLibrary == nullptr)
# if _CCCL_OS(WINDOWS)
static auto m_cudaDriverLibrary = ::LoadLibraryExA("nvcuda.dll", nullptr, LOAD_LIBRARY_SEARCH_SYSTEM32);
if (m_cudaDriverLibrary == nullptr)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this function also be _CCCL_PUBLIC_HOST_API so we really query it only once across TUs?

# else
const char* m_cudaDriverLibraryName = "libcuda.so.1";
# endif
void* m_cudaDriverLibrary = dlopen(m_cudaDriverLibraryName, RTLD_NOW);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto should this be function local static?

@pciolkosz
Copy link
Contributor Author

/ok to test 193aad2

@github-actions

This comment has been minimized.

@pciolkosz
Copy link
Contributor Author

/ok to test 39b3355

@github-actions

This comment has been minimized.

@pciolkosz
Copy link
Contributor Author

/ok to test bcb2d5f

@github-actions

This comment has been minimized.

@pciolkosz
Copy link
Contributor Author

/ok to test 4675dfa

@github-actions

This comment has been minimized.

@pciolkosz
Copy link
Contributor Author

/ok to test 8ed886b

@pciolkosz
Copy link
Contributor Author

/ok to test 8fda00c

@pciolkosz pciolkosz requested a review from davebayer December 10, 2025 01:37
@pciolkosz pciolkosz marked this pull request as ready for review December 10, 2025 01:39
@pciolkosz pciolkosz requested a review from a team as a code owner December 10, 2025 01:39
@cccl-authenticator-app cccl-authenticator-app bot moved this from In Progress to In Review in CCCL Dec 10, 2025
@github-actions

This comment has been minimized.

@github-actions
Copy link
Contributor

🥳 CI Workflow Results

🟩 Finished in 7h 36m: Pass: 100%/91 | Total: 2d 06h | Max: 2h 56m | Hits: 92%/195554

See results here.

Comment on lines 56 to 58
# if _CCCL_OS(WINDOWS)
HMODULE m_cudaDriverLibrary = LoadLibraryEx("nvcuda.dll", nullptr, LOAD_LIBRARY_SEARCH_SYSTEM32);
if (m_cudaDriverLibrary == nullptr)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this function also be _CCCL_PUBLIC_HOST_API so we really query it only once across TUs?

Comment on lines +69 to +73
const char* m_cudaDriverLibraryName = "libcuda.so";
# else
const char* m_cudaDriverLibraryName = "libcuda.so.1";
# endif
void* m_cudaDriverLibrary = ::dlopen(m_cudaDriverLibraryName, RTLD_NOW);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, use __my_var style

@github-project-automation github-project-automation bot moved this from In Review to In Progress in CCCL Dec 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

4 participants