-
Notifications
You must be signed in to change notification settings - Fork 14.1k
metal: use shared buffers on eGPU #17866
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
metal: use shared buffers on eGPU #17866
Conversation
With ggml-org#15906, I noticed on important regression when using metal backend on eGPU. This commit restore the previous behavior and add an option to force its activation.
|
I'm not familiar with the concept of eGPU - is this running on an Intel Mac? |
Looks like it, and an external GPU connected via thunderbolt. |
Yes, this is specific to Intel Mac when desktop GPU are plugged behind Thunderbolt. |
|
Thanks. Would need to fix the ios, tvos and visionos builds. |
For sure. |
|
CI is still failing :/ |
|
My bad, I thought TARGET_OS_OSX where not defined for ios, tvos, and visonos. |
With #15906, I noticed on important regression when using metal backend on eGPU.
This commit restore the previous behavior and add an option to force its activation.
Before #15906, llama-bench on gemma 3 give me this kind of result:
So above 45t/s on pp test, and more than 5t/s on tg test.
After #15906, pp test has improved but tg test has been divided by 2.
Launching the benchmark with "Metal System Trace" in Instruments.app, reveals some usage of the DMA1 channel which introduced lot of latency (at least, this is how I interpreted it).
With this PR, the performance are back as before on eGPU and should not impact any other configuration (dGPU and M1-M5).