v0.4.2: Training cache fixes, Qwen3 Embedding support added, vLLM v1 API

Latest

Latest

tengomucho released this 20 Nov 10:30

0b07298

What's Changed

Inference

Fix input slots exhaustion in vLLM plugin by @dacorvo in #1028
Agentic example by @tengomucho in #1030
perf: move accuracy benchmark to vllm by @dacorvo in #1031
Add support for Qwen3 embedding model by @dacorvo in #1023
Update vllm version to 0.11.0 by @dacorvo in #1027
feat: Add encode and similarity of Sentence transformers by @JingyaHuang in #1012

Training

Metrics for training by @michaelbenayoun in #982
Update trl version to the latest release 0.11.4 -> 0.24.0 by @michaelbenayoun in #1000
Add cache features to the NeuronTrainer by @michaelbenayoun in #1026

Other

Sync with transformers 4.57.1 by @michaelbenayoun in #1016
ci(vllm): login to docker by @tengomucho in #1010
Fix small typos by @tengomucho in #1021
Bump optimum to 2.0 by @JingyaHuang in #1018
Unpin protobuf version by @JingyaHuang in #1014
Fixing link in error message by @jimburtoft in #1029
fix(vllm): fix base_neuron_llm_config fixture by @tengomucho in #1032

Full Changelog: v0.4.1...v0.4.2

Contributors

dacorvo, tengomucho, and 3 other contributors

Assets 2