Update README by @vosen in #315
Fix test zluda_dump by @JoelleJS in #316
feat: enable LTO and codegen-units = 1 optimization by @zamazan4ik in #318
fix: missing inherits in a release-lto profile by @zamazan4ik in #319
Improve build system by @vosen in #329
LLVM unit tests by @JoelleJS in #324
Implement mode tracking for AMD GPU by @vosen in #342
Implement mul24 by @JoelleJS in #351
Explicitly fail compilation on ROCm 6.4 by @vosen in #361
Create infrastructure for performance libraries by @vosen in #363
Fix ROCm 6.4 failures by @vosen in #364
Work around broken AMD Adrenalin 25.5.1 driver by @vosen in #366
Redo logging to better log dark API and performance libraries by @vosen in #372
Fix mad.wide, replace external CUDA library in tests with our own by @vosen in #376
Implement cuGetProcAddress and cuGetProcAddress_v2 by @zluda-violet in #377
Implement runtime_callback_hooks_fn2 by @zluda-violet in #380
Implement cuModuleGetLoadingMode by @zluda-violet in #381
Implement cudart_interface_fn2 by @zluda-violet in #382
Add automated builds by @vosen in #358
Handle new attributes in cuDeviceGetAttribute by @zluda-violet in #383
Implement runtime_callback_hooks_fn6 by @zluda-violet in #386
Add fp saturation, fix various bugs in cvt instruction exposed by ptx_tests by @vosen in #379
Use integrity_check implementation by @zluda-violet in #387
Implement cuLibraryLoadData by @zluda-violet in #388
Fix bug in get_payload by @zluda-violet in #389
Remove trailing zeroes from end of ptx by @zluda-violet in #390
Error instead of infinite loop in derive_parser! by @zluda-violet in #391
Bump dependencies by @vosen in #392
Check LLVM IR for test_ptx! with no input/output by @zluda-violet in #394
Unified fatbin versions behind a single iterator. by @aiwhskruht in #398
Make derive_parser work with all optional arguments by @zluda-violet in #397
Read test files at runtime for development ergonomics by @zluda-violet in #395
Fix floating point min/max by @vosen in #399
Add warp-wide tests by @zluda-violet in #400
Add support for bar.red.and.pred by @zluda-violet in #402
Run unit tests on every commit by @vosen in #401
Add initialized check to protect zluda from calls during shutdown by @aiwhskruht in #404
Implement more CUDA driver API to enable simple cuda-samples by @aiwhskruht in #405
[WIP] Start working on PhysX 32bit by @vosen in #374
Update README.md by @zluda-violet in #407
Add support for multiple return arguments by @zluda-violet in #406
Enable sccache in Rust builds, publish prerelease builds by @vosen in #408

New Contributors

@zamazan4ik made their first contribution in #318
@zluda-violet made their first contribution in #377
@aiwhskruht made their first contribution in #398

Full Changelog: v4...v5-preview.43

Contributors

vosen, zamazan4ik, and 3 other contributors

Assets 4

31 Dec 15:19

vosen

de870db

Version 4

This is the first release post-rollback and is very limited: only Geekbench is supported

Assets 4

12 Feb 14:09

vosen

1b9ba2b

Version 3

Nobody expects the Red Team

Too many changes to list, but broadly:

Remove Intel GPU support from the compiler
Add AMD GPU support to the compiler
Remove Intel GPU host code
Add AMD GPU host code
More device instructions. From 40 to 68
More host functions. From 48 to 184
Add proof of concept implementation of OptiX framework
Add minimal support of cuDNN, cuBLAS, cuSPARSE, cuFFT, NCCL, NVML
Improve ZLUDA launcher for Windows

Assets 4

22 Feb 17:17

vosen

4d3e37b

Version 2

The goal of version 2 has been to fix end to end execution of GeekBench and improve Windows support:

Several new host-side functions are supported now (e.g. cuModuleLoadDataEx)
Several bugs have been fixed on the kernel side (e.g. threadIdx/blockIdx is now handled correctly)
Minor improvement in generated code brought better I/O performance when reading/writing vector objects. This improved performance by several percentage points in select GeekBench benchmarks
ZLUDA now ships its own injector (with_zluda.exe) which should make running ZLUDA on Windows much easier
Additionally, we have gained ability to easily create traces of CUDA kernel execution, making enabling new workloads much easier
ZLUDA now has a CI, which produces binaries on every pull request and commit

Special thanks to @take-cheeze, @nilsmartel and @ritschwumm for contributing to this release

Assets 4

Releases: vosen/ZLUDA

Version 5-preview.49

What's Changed

Contributors

Uh oh!

Version 5-preview.48

What's Changed

Contributors

Uh oh!

Version 5-preview.47

What's Changed

New Contributors

Contributors

Uh oh!

Version 5-preview.46

What's Changed

Contributors

Uh oh!

Version 5-preview.45

What's Changed

Contributors

Uh oh!

Version 5-preview.44

What's Changed

Contributors

Uh oh!

Version 5-preview.43

What's Changed

New Contributors

Contributors

Uh oh!

Version 4

Uh oh!

Version 3

Nobody expects the Red Team

Uh oh!

Version 2

Uh oh!