🚀 New Features Highlights
Batch API & Multimodal and other OpenAI compatible API Surface
- Batch API Support: Add OpenAI-style Batch API with simple LLM workers, Envoy/Gateway integration, JSONL & File List support, job pool sizing, and robust validation to safely offload large asynchronous workloads. (#1298, #1617, #1671, #1698, #1700, #1701)
- Embeddings API and Moltimodal API: Introduce OpenAI-compatible embeddings endpoint so online inference, search, and RAG traffic can share the same AIBrix control plane and routing. (#1570) Support multimodality deployments and for image/video generation for other engines. (#1678, #1679, #1603, #1584)
- Files API & Unified Storage: Implement OpenAI Files API plus a pluggable storage layer (local, S3, TOS, Redis metadata) to standardize artifact and batch job management across backends. (#1583, #1571)
AIBrix KVCache Offloading frameworks & Connectors:
- High-Performance KVCache: Adds GDR support, optimized collective communications, configurable max sequence length and batched tokens, multi-threading for higher concurrency, and block-hash based APIs plus external cache handles for flexible distributed deployments. (#1411, #1446, #1453, #1451, #1627, #1628, #1545, #1531, #1542)
- Deep Engine Integrations: Provide official AIBrix KVCache Dockerfiles and integration paths for vLLM and SGLang plus correctness fixes (head size, metrics, types) to make KV offloading a first-class option. (#1641, #1696, #1705, #1473, #1450, #1689)
Production-Grade Prefill/Decode (P/D) Orchestration Support:
- New StormService Primitives: Add PodSet API, PodGroup support, FullRecreate strategy, role upgrade sequences, roleStatuses, and richer RoleSet/PodSet fields to model multi-pod workers, shard groups, and safer rollout/rollback for complex topologies. (#1475, #1506, #1511, #1432, #1599, #1560)
- P/D-Aware & Topology-Aware Routing: Prefer P/D workers in the same RoleSet in replication mode, score candidates by locality/load, and harden PD routing behavior for Nixl-based setups. (#1409, #1634, #1429, #1601, #1703, #1693)
- Role-Level Autoscaling for StormService: Introduced the "subTargetSelector" field in the PodAutoscaler API, allowing independent autoscaling of specific roles (e.g., prefill, decode) within a StormService resource, particularly in pooled mode. (#1625)
📊 Feature Enhancements
- Unified Runtime & Metadata: Migrate metadata server from golang to Python for a simpler, lighter control path. Add liveness/readiness probes and shrink runtime image sizes. Improve downloader reliability and recursive object-store fetch support. (#1391, #1639, #1548, #1702, #1571)
- LoRA & Model Adapter Reliability: Support adapter scaling to desired replicas, refactor replica management, add wrappers, and enable LoRA downloading via the runtime to stabilize multi-adapter hosting.
(#1132, #1472, #1670, #1680, #1537, #1541) - Autoscaling: Unify and harden metrics fetching by adding retryable RestMetricsFetcher, shared client/aggregator and fixing race-condition for configuration updates (#1466, #1487, #1620, #1621, #1709), Tune KPA defaults, support metric label selectors, and ensure PodAutoscaler emits events only when replica counts actually change. (#1624, #1629, #1630) scaling history decision has been supported in the status spec (#1618)
- AIBrixRuntime Injection: Deployment & StormService webhooks and wrapper libraries to auto-inject the runtime sidecar, standardizing metrics, downloads, and admin controls across engines.
(#1403, #1457, #1543, #1681, #1561)
📦 Installation & Tooling & CI
- Helm & Installation: Strengthen the AIBrix Helm chart as the recommended deployment path by adding dedicated chart CI and fixes (#1370, #1424), enriching Chart.yaml metadata (#1414), introducing values.schema.json for input validation (#1415), supporting imagePullSecrets configuration (#1522), and resolving duplicate label issues for Flux Helm Controller compatibility (#1615). Made KubeRay optional for AIBrix installations if you do not use RayclusterFleet API(#1724)
🐞 Critical Bug Fixes
- Fixes StormService headless Service ownership and DNS behavior by setting proper ownerReferences and PublishNotReadyAddresses. (#1441, #1442)
- Fixes incorrect naming for AIBRIX_MODEL_GPU_PROFILE_CACHING_FLAG to ensure configuration consistency. (#1427)
- Fixes KVCache stability issues by preventing panic when watcher or metadata are not set in kvcache.spec. (#1526)
- Fixes PodAutoscaler and metrics correctness by emitting events only on replica changes, aggregating resources across all containers, handling optional MetricSource fields, validating multiple PodAutoscalers targeting the same workload, and ensuring PodSet autoscaler collects metrics from rank0. (#1630, #1643, #1648, #1662, #1704)
New Contributors
- @JonathonShea made their first contribution in #1427
- @bigerous made their first contribution in #1442
- @jiangxiaobin96 made their first contribution in #1431
- @mayooot made their first contribution in #1496
- @zyfy29 made their first contribution in #1505
- @zhengkezhou1 made their first contribution in #1502
- @tianzhiqiang3 made their first contribution in #1566
- @atakli made their first contribution in #1574
- @jwjwjw3 made their first contribution in #1573
- @lx1036 made their first contribution in #1586
- @chethanuk made their first contribution in #1558
- @baozixiaoxixi made their first contribution in #1608
- @TylerGillson made their first contribution in #1615
- @omrishiv made their first contribution in #1626
- @lex1ng made their first contribution in #1658
- @ChenTaoyu-SJTU made their first contribution in #1672
- @zhenyu-02 made their first contribution in #1682
- @yapple made their first contribution in #1705
- @xvoron made their first contribution in #1708
- @freedown19 made their first contribution in #1716
- @Leafykn made their first contribution in #1718
What's Changed
Full Changelog: v0.4.0...v0.5.0
- Update installation guidance for v0.4.0 by @Jeffwan in #1406
- [Bug] fix webhook config output when using make manifests by @googs1025 in #1412
- Feat: Add AIBrix Helm chart CI by @omerap12 in #1370
- [Feature] KVCache: support GDR by @DwyaneShi in #1411
- Select PD workers in same roleset by @varungup90 in #1409
- [Bug] fix chart-ci by @omerap12 in #1424
- [Misc]: Enhance Chart.yaml metadata with comprehensive information by @Jeffwan in #1414
- [feat]: Add values.schema.json for Helm chart input validation by @Jeffwan in #1415
- [Fix] Fix vLLM NIXL-based P/D samples by @DwyaneShi in #1425
- [Bug] Corrected naming convention for AIBRIX_MODEL_GPU_PROFILE_CACHING_FLAG by @JonathonShea in #1427
- Feat: add liveness & readiness probes to metadata service by @omerap12 in #1391
- [Fix] Disable GGA in NIXL samples by @DwyaneShi in #1436
- doc: correct release date in README.md by @nurali-techie in #1435
- [Misc]: Remove v0.4.0 test files replaced by consolidated base templates by @Jeffwan in #1428
- feature: add stormservice webhook for inject aibrix runtime by @googs1025 in #1403
- [Chore] KVCache: downgrade to cuda 12.1 by @DwyaneShi in #1444
- [misc] Update vLLM PD disaggregation image by @happyandslow in #1445
- [Misc] remove unuseless event by @googs1025 in #1447
- [Feature] KVCache: support max seq len by @DwyaneShi in #1446
- [Bug] stormservice's headless service not set ownerRef by @bigerous in #1442
- [Bug] stormservice's headless service need set PublishNotReadyAddresses by @bigerous in #1441
- [Bug] KVCache: fix metrics by @DwyaneShi in #1450
- [Improvement] KVCache: optimize coll communication by @DwyaneShi in #1451
- Fix P/D disaggregation router to follow Nixl kv_transfer_params by @Jeffwan in #1429
- [Misc] fix regression test manifests by @Jeffwan in #1456
- [Bug] KVCache: fix max seq len support by @DwyaneShi in #1453
- fix: align envoy pod template labels with controller selector by @omerap12 in #1439
- [Docs] Update helm installation guidance by @Jeffwan in #1461
- Refactor to use single loop for least request pod selection by @jiangxiaobin96 in #1431
- [Misc]: Updates docs to reflect the latest router interface by @googs1025 in #1465
- [Misc] Add development workload samples by @Jeffwan in #1467
- Feat(autoscaler): Add retry delays to RestMetricsFetcher by @omerap12 in #1466
- [Feat] Support adapter scaling to desired replicas by @dittops in #1132
- [fix] Correct the headless service ownerReference UID in tests by @Jeffwan in #1469
- Refactor: Extract KV event management to break circular dependency by @ae86zhizhi in #1401
- [Integration] correct head size calculation for AIBrix connectors by @DwyaneShi in #1473
- Cut release v0.4.1 by @Jeffwan in #1477
- Update installation guidance for v0.4.1 by @Jeffwan in #1479
- add least_request_test by @jiangxiaobin96 in #1463
- [Test]: add controller integration test framework by @googs1025 in #1448
- Fix missing traffic_pattern parameters in benchmark script by @happyandslow in #1484
- [Misc] Enhance S3Downloader error handling and IRSA support by @ronaldosaheki in #1483
- [Chore] KVCache: update torch version by @DwyaneShi in #1490
- refactor(scheme): prevent duplicate registration in RegisterSchemas by @googs1025 in #1464
- [DOCS] fix: add ReferenceGrant configuration by @omerap12 in #1486
- Feat: Support role upgrade sequences in stormservice by @omerap12 in #1432
- [Feat]: add deployment webhook to inject AIBrixRuntime sidecar container by @googs1025 in #1457
- refactor: improve autoscaler metrics fetcher design by @Jeffwan in #1487
- [feat]: Add PodSet API for multi-pod worker support in StormService by @Jeffwan in #1475
- Fix missing FQDN and reuse hash func from syncer by @Jeffwan in #1500
- [fix] deep copy template to avoid hash mutation issue by @Jeffwan in #1501
- [Docs] Add
Using NodePort to Expose the Gateway APIsection by @mayooot in #1496 - [Misc] Add unit tests for least_busy_time by @omerap12 in #1495
- add rolset integration test by @googs1025 in #1491
- [Misc]: ignore AlreadyExists and NotFound in controller operations by @googs1025 in #1507
- [Misc] Add unit test for least util routing algorithm by @jiangxiaobin96 in #1497
- [Misc] test: add unit test for leastGpuCacheRouter by @zyfy29 in #1505
- [Bug] fix count ready podset condition by @Epsilon314 in #1518
- [Misc]: add sync-crds makefile cmd by @googs1025 in #1513
- [Misc] Refactor KPA algorithm by @omerap12 in #1503
- [CI]: add verify_crd makefile by @googs1025 in #1514
- [Misc]chore: ignore auto-generated
_version.pyfile by @zhengkezhou1 in #1502 - [Feat] Improve ModelAdapter reliability with retry and pod switching by @Jeffwan in #1472
- [Misc] Fix incorrect expressions for mean TTFT and TPOP in Grafana by @rudeigerc in #1521
- [Docs] Add development test guidance by @Jeffwan in #1520
- [feat]Add imagePullSecrets values for helm by @my-git9 in #1522
- [Misc] Add autoscaling validation of minReplicas and maxReplicas by @jiangxiaobin96 in #1508
- [Misc] test: add unit test for throughputRouter by @zyfy29 in #1516
- [Misc] test: add unit test for leastKvCacheRouter by @zyfy29 in #1515
- [MISC]: add test for least_load by @omerap12 in #1524
- [Bug]fix panic when watcher and metadata not set in kvcache.spec by @zhixian82 in #1526
- [Bug]: fix annotation in deployment webhook by @googs1025 in #1527
- [Misc] Generate StormService golang client by @Jeffwan in #1532
- [Feature] KVCache: support external cache handle by @DwyaneShi in #1531
- [Misc] add model adapter wrapper by @nurali-techie in #1537
- [Misc]: add podset integration test by @googs1025 in #1533
- [Bug] use fallback value to prevent divide zero error by @zyfy29 in #1517
- [Refactor] KVCache: simplify external cache handle create API by @DwyaneShi in #1542
- [MISC] Add wrapper for stormservice by @omerap12 in #1543
- [Misc] add kvcache wrapper by @nurali-techie in #1541
- [Misc]: add stormservice integration test by @googs1025 in #1544
- Improve the runtime downloader quality by @Jeffwan in #1548
- [misc] Consolidate tests under top-level python tests by @Jeffwan in #1549
- [runtime] Improve lora registration reliability by @Jeffwan in #1550
- [runtime] Enrich engine metrics and add support for sglang by @Jeffwan in #1551
- Improve upgradeOrder behavior more intuitive and safer by @Jeffwan in #1547
- [CI] fix vllm-mock runtime sidecar startup issue by @Jeffwan in #1555
- [Mics]: exclude integration tests from make test by @googs1025 in #1556
- [Misc] test: modify apa scale test by @jiangxiaobin96 in #1523
- [Misc] add deployment wrapper by @nurali-techie in #1561
- [API] KVCache: support block hashes by @DwyaneShi in #1545
- [API] Add "FullRecreate" strategy for atomic PodSet recovery by @mayooot in #1511
- [Misc]: add more field for stormservice roleset podset by @googs1025 in #1560
- [Misc] (test): add unit test for prefix_cache_preble by @zhengkezhou1 in #1519
- [MISC] Refactor labels.go & add unit tests by @omerap12 in #1563
- fix(deploy): add missing labels to pod template by @googs1025 in #1568
- [Docs]: update installation guide for AIBrix with Helm prerequisites … by @haitwang-cloud in #1539
- Add multi-metrics-source-support for hpa by @tianzhiqiang3 in #1566
- Update the broken link related to vllm metrics by @atakli in #1574
- [feat]Add tolerations values for chart by @my-git9 in #1579
- [Feat] Batch API Service, working with temporary existing local batch driver by @zhangjyr in #1298
- [Bug] fix: unexpectedly high TTFT in benchmarks results of reasoning (Chain-of-Thought) LLMs by @jwjwjw3 in #1573
- [Misc] Suppress info logs in integration tests by @Jeffwan in #1585
- [Bug] add the return value check to pass linter by @Jeffwan in #1588
- [Feat] Enable redis password in helm chart by @lx1036 in #1586
- [Misc] Add tests for FallbackRouter by @chethanuk in #1558
- [CI] Exclude integration tests from race condition test by @Jeffwan in #1590
- [feat] Add embedding API by @varungup90 in #1570
- [Mics]: optimize deleteReferenceGrant with label selector in ModelRouter controller by @googs1025 in #1589
- refactor: implement clean layered autoscaler architecture by @Jeffwan in #1575
- [Misc] Remove type and impl duplication in autocaler by @Jeffwan in #1596
- [Feat] Support OpenAI Files API and refactor storage library to support local, s3, tos, and redis (for metadata) by @zhangjyr in #1583
- Add recursive download for tos v2/s3 client by @happyandslow in #1571
- [API]: add roleStatuses field in stormservice api by @googs1025 in #1599
- Add gateway-plugin support to generate image and video by @varungup90 in #1603
- Improve pdRouter with load-aware routing by @googs1025 in #1601
- [Misc]: add more field for podautoscaler resource by @omerap12 in #1611
- [Docs]: Update set_metrics override keys and example curl by @googs1025 in #1612
- Use EnqueueRequestsFromMapFunc for KVCache controller by @baozixiaoxixi in #1608
- [Misc] Add tests for PrefixCacheRouting by @chethanuk in #1587
- Simplify autoscaler by unifying client and aggregator by @Jeffwan in #1620
- Resolve the UpdateConfiguration race issue in autoscaler by @Jeffwan in #1621
- Bug: remove duplicate labels that prevent flux helm-controller from deploying the aibrix chart by @TylerGillson in #1615
- Support scaling behaviors and remove configuration duplication by @Jeffwan in #1622
- [Docs] update aws documentation to include AI on EKS AIBrix deployment by @omrishiv in #1626
- feat: add scaling history decision in the status spec by @omerap12 in #1618
- [Chore] KVCache: add max_num_batched_tokens config by @DwyaneShi in #1627
- feat: support metric label by @baozixiaoxixi in #1624
- [Misc] Benchmark: add duration limit and max concurrent sessions settings to client by @ronaldosaheki in #1632
- Support P/D Pooling autoscaling in StormService by @Jeffwan in #1625
- [Feat] OpenAI Batch API Support with Simple LLM Workers by @zhangjyr in #1617
- [Bug] podautoscaler: only emit when replica count changes by @omerap12 in #1630
- update stable and panic value for KPA by @baozixiaoxixi in #1629
- [Mics]: support qos and burst flag in controller by @googs1025 in #1637
- feat(metrics): add autoscaler scale action prometheus metric by @omerap12 in #1638
- [Misc]: sync crds file to helm by @googs1025 in #1635
- [Feature] KVCache: support multi-threading mode by @DwyaneShi in #1628
- [Integration] vllm aibrix scheduler and connectors by @DwyaneShi in #1641
- [Feat] Migrate metadata server from Go to Python by @Jeffwan in #1639
- [Doc] Refining workload generator documentation by @happyandslow in #1631
- [Misc] Multimodality Scenarios Sample Deployment by @happyandslow in #1584
- [Bug] fix: aggregate resource metrics across all containers in pod by @googs1025 in #1643
- [Bug]: Make optional fields in MetricSource by @googs1025 in #1648
- Add requirements file and fix api-key bug by @omerap12 in #1657
- [Doc] Support batch inference usage doc by @Jeffwan in #1646
- [Bug]: add validation for multiple PodAutoscalers targeting the same workload by @googs1025 in #1662
- [Misc] refactor roleSet test validation to use validation Package by @lex1ng in #1658
- [Misc] add num_waiting_reqs metrics by @scarlet25151 in #1668
- [Mics]: fix retry reconciler in podautoscaler controller by @googs1025 in #1669
- [Misc] Batch API envoy integration fix, E2E verification, and document update by @zhangjyr in #1671
- [Misc] Added unit test for WorkloadScale by @nurali-techie in #1666
- [Misc] Added unit test for APA post autoscaler by @nurali-techie in #1659
- [MISC] add unit test for podautoscaler monitor by @omerap12 in #1676
- [CLI] Fix replicate args assignment in main.go by @ChenTaoyu-SJTU in #1672
- Link to system-installed zmq without building by @autopear in #1372
- Refactor the model adapter replicas feature by @Jeffwan in #1670
- Add e2e OpenAI API compatibility test by @Jeffwan in #1678
- Make Gateway API compatible with OpenAI API by @Jeffwan in #1679
- Support downloading lora models through runtime by @Jeffwan in #1680
- [Misc] Optimize runtime sidecar injection logic by @Jeffwan in #1681
- [Docs] Add VKE docs and update P/D examples on VKE by @Jeffwan in #1687
- [Misc] Add Integration Test Utilities for PodAutoscaler Controller by @zhenyu-02 in #1682
- [Fix] KVCache: fix result type of group aware kv manager by @DwyaneShi in #1689
- [Feat]: support mulit metrics for podautoscaler by @googs1025 in #1688
- [Feat]: add podautoscaler webhook by @googs1025 in #1683
- [Misc] Added unit tests for WorkloadScale.SetDesiredReplicas by @nurali-techie in #1692
- [Bug]: fix pd route Algorithms do not check http route by @googs1025 in #1693
- [Integration] Update vLLM v0.10.2 patch by @DwyaneShi in #1695
- [Integration] add AIBrix KVCache x vLLM dockerfile by @DwyaneShi in #1696
- [batch] Bug fixes and code Improvements in batch API by @Jeffwan in #1698
- [batch] Support File List API, configurable job pool size and error file handling by @Jeffwan in #1700
- [batch] Add jsonl input and file validation by @Jeffwan in #1701
- [CI] Reduce runtime container image size by @Jeffwan in #1702
- [Misc] Improve P/D router reliability by @Jeffwan in #1703
- [Misc] Added unit tests related to PodAutoscaler by @nurali-techie in #1694
- [Integration] clone repo w/ tags in vLLM dockerfile by @DwyaneShi in #1706
- [Integration] add AIBrix KVCache x SGLang dockerfile by @yapple in #1705
- [BUG] fix: aibrix_benchmark streaming issue #1674 by @xvoron in #1708
- [Misc]Add unit tests related to gateway server by @freedown19 in #1716
- [Docs]: add multi metrics podautoscaler docs by @googs1025 in #1712
- [feat]: Select and score P/D in same roleset by @varungup90 in #1634
- fix: autoscaler for the podset only collect metrics from rank0 by @scarlet25151 in #1704
- [Misc] Added unit test for metrics fetcher by @nurali-techie in #1709
- [Misc] Made KubeRay optional for AIBrix installations by @Jeffwan in #1724
- [fix] Moved deps from profiling group back to main by @Jeffwan in #1726
- [Docs] Add volcano engine startup docs and quick start by @Jeffwan in #1725
- feat: add EIC connector by @Leafykn in #1718
- [API] stormservice support podgroup by @Epsilon314 in #1506
- Cut v0.5.0 release by @Jeffwan in #1737
- [Docs] Improve the docs and examples by @Jeffwan in #1738
New Contributors
- @JonathonShea made their first contribution in #1427
- @bigerous made their first contribution in #1442
- @jiangxiaobin96 made their first contribution in #1431
- @mayooot made their first contribution in #1496
- @zyfy29 made their first contribution in #1505
- @zhengkezhou1 made their first contribution in #1502
- @tianzhiqiang3 made their first contribution in #1566
- @atakli made their first contribution in #1574
- @jwjwjw3 made their first contribution in #1573
- @lx1036 made their first contribution in #1586
- @chethanuk made their first contribution in #1558
- @baozixiaoxixi made their first contribution in #1608
- @TylerGillson made their first contribution in #1615
- @omrishiv made their first contribution in #1626
- @lex1ng made their first contribution in #1658
- @ChenTaoyu-SJTU made their first contribution in #1672
- @zhenyu-02 made their first contribution in #1682
- @yapple made their first contribution in #1705
- @xvoron made their first contribution in #1708
- @freedown19 made their first contribution in #1716
- @Leafykn made their first contribution in #1718
Full Changelog: v0.4.0...v0.5.0