Releases: vosen/ZLUDA
Releases Β· vosen/ZLUDA
Version 5-preview.49
What's Changed
- Allow messages for error_todo by @zluda-violet in #415
Full Changelog: v5-preview.48...v5-preview.49
Version 5-preview.48
What's Changed
Full Changelog: v5-preview.47...v5-preview.48
Version 5-preview.47
Version 5-preview.46
What's Changed
- Handle
WARP_SZby @zluda-violet in #412
Full Changelog: v5-preview.45...v5-preview.46
Version 5-preview.45
What's Changed
- More descriptive message for unknown symbol by @zluda-violet in #411
Full Changelog: v5-preview.44...v5-preview.45
Version 5-preview.44
What's Changed
- Remove duplicate call to linker by @zluda-violet in #410
Full Changelog: v5-preview.43...v5-preview.44
Version 5-preview.43
What's Changed
- Update README by @vosen in #315
- Fix test zluda_dump by @JoelleJS in #316
- feat: enable LTO and codegen-units = 1 optimization by @zamazan4ik in #318
- fix: missing inherits in a release-lto profile by @zamazan4ik in #319
- Improve build system by @vosen in #329
- LLVM unit tests by @JoelleJS in #324
- Implement mode tracking for AMD GPU by @vosen in #342
- Implement mul24 by @JoelleJS in #351
- Explicitly fail compilation on ROCm 6.4 by @vosen in #361
- Create infrastructure for performance libraries by @vosen in #363
- Fix ROCm 6.4 failures by @vosen in #364
- Work around broken AMD Adrenalin 25.5.1 driver by @vosen in #366
- Redo logging to better log dark API and performance libraries by @vosen in #372
- Fix mad.wide, replace external CUDA library in tests with our own by @vosen in #376
- Implement cuGetProcAddress and cuGetProcAddress_v2 by @zluda-violet in #377
- Implement runtime_callback_hooks_fn2 by @zluda-violet in #380
- Implement cuModuleGetLoadingMode by @zluda-violet in #381
- Implement cudart_interface_fn2 by @zluda-violet in #382
- Add automated builds by @vosen in #358
- Handle new attributes in
cuDeviceGetAttributeby @zluda-violet in #383 - Implement
runtime_callback_hooks_fn6by @zluda-violet in #386 - Add fp saturation, fix various bugs in cvt instruction exposed by ptx_tests by @vosen in #379
- Use
integrity_checkimplementation by @zluda-violet in #387 - Implement
cuLibraryLoadDataby @zluda-violet in #388 - Fix bug in get_payload by @zluda-violet in #389
- Remove trailing zeroes from end of ptx by @zluda-violet in #390
- Error instead of infinite loop in
derive_parser!by @zluda-violet in #391 - Bump dependencies by @vosen in #392
- Check LLVM IR for
test_ptx!with no input/output by @zluda-violet in #394 - Unified fatbin versions behind a single iterator. by @aiwhskruht in #398
- Make
derive_parserwork with all optional arguments by @zluda-violet in #397 - Read test files at runtime for development ergonomics by @zluda-violet in #395
- Fix floating point min/max by @vosen in #399
- Add warp-wide tests by @zluda-violet in #400
- Add support for
bar.red.and.predby @zluda-violet in #402 - Run unit tests on every commit by @vosen in #401
- Add initialized check to protect zluda from calls during shutdown by @aiwhskruht in #404
- Implement more CUDA driver API to enable simple cuda-samples by @aiwhskruht in #405
- [WIP] Start working on PhysX 32bit by @vosen in #374
- Update README.md by @zluda-violet in #407
- Add support for multiple return arguments by @zluda-violet in #406
- Enable sccache in Rust builds, publish prerelease builds by @vosen in #408
New Contributors
- @zamazan4ik made their first contribution in #318
- @zluda-violet made their first contribution in #377
- @aiwhskruht made their first contribution in #398
Full Changelog: v4...v5-preview.43
Version 4
Version 3
Nobody expects the Red Team
Too many changes to list, but broadly:
- Remove Intel GPU support from the compiler
- Add AMD GPU support to the compiler
- Remove Intel GPU host code
- Add AMD GPU host code
- More device instructions. From 40 to 68
- More host functions. From 48 to 184
- Add proof of concept implementation of OptiX framework
- Add minimal support of cuDNN, cuBLAS, cuSPARSE, cuFFT, NCCL, NVML
- Improve ZLUDA launcher for Windows
Version 2
The goal of version 2 has been to fix end to end execution of GeekBench and improve Windows support:
- Several new host-side functions are supported now (e.g. cuModuleLoadDataEx)
- Several bugs have been fixed on the kernel side (e.g. threadIdx/blockIdx is now handled correctly)
- Minor improvement in generated code brought better I/O performance when reading/writing vector objects. This improved performance by several percentage points in select GeekBench benchmarks
- ZLUDA now ships its own injector (with_zluda.exe) which should make running ZLUDA on Windows much easier
- Additionally, we have gained ability to easily create traces of CUDA kernel execution, making enabling new workloads much easier
- ZLUDA now has a CI, which produces binaries on every pull request and commit
Special thanks to @take-cheeze, @nilsmartel and @ritschwumm for contributing to this release