Skip to content

Conversation

@Jzz1943
Copy link

@Jzz1943 Jzz1943 commented Nov 10, 2025

Support running CosyVoice2 inference with vLLM 0.11.0(V1 engine only) for better performance.
image
Under the same conditions, compared with vLLM 0.9.0 (V0 engine), the first-chunk latency for inference with vLLM 0.11.0 (V1 engine) is reduced by approximately 15+ ms. Additionally, the first-chunk latency is more stable, with much smaller fluctuations than the V0 engine.

@Jzz1943 Jzz1943 changed the title support vLLM >=0.11.0 (V1 engine only) support vLLM >=0.11.0 (V1 engine) for better performance Nov 13, 2025
ayutaz pushed a commit to ayutaz/CosyVoice that referenced this pull request Dec 10, 2025
Upstream improvements from FunAudioLLM/CosyVoice:

- PR FunAudioLLM#1640: Support vLLM 0.11.0+ (V1 engine) for better performance
  - First-chunk latency reduced by ~15ms
  - More stable latency with smaller fluctuations
  - Backward compatible with vLLM 0.9.0

- PR FunAudioLLM#1129: Add limited support for MPS devices (Apple Silicon)
  - Enables partial compatibility with M1/M2/M3/M4 Macs
  - Auto-enables JIT on MPS for better performance
  - ONNX models fall back to CPU (ONNX Runtime limitation)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant