huggingface / trl Public

generated from fastai/nbdev_template

Notifications You must be signed in to change notification settings
Fork 2.3k
Star 16.6k

Code
Issues 526
Pull requests 76
Discussions
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Projects
Security
Insights

Pull requests: huggingface/trl

Labels 34 Milestones 0

New pull request New

76 Open 2,312 Closed

Author

Filter by author

Uh oh!

There was an error while loading. Please reload this page.

Label

Filter by label

Uh oh!

There was an error while loading. Please reload this page.

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Uh oh!

There was an error while loading. Please reload this page.

Milestones

Filter by milestone

Uh oh!

There was an error while loading. Please reload this page.

Reviews

Filter by reviews

No reviews Review required Approved review Changes requested

Assignee

Filter by who’s assigned

Assigned to nobody

Uh oh!

There was an error while loading. Please reload this page.

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Pull requests list

feat: implement DeepSeek unbiased KL estimator for GRPO

#4638 opened Dec 7, 2025 by jlcanta

Loading…

2 tasks done

Disable gradient checkpointing during no-grad inference to avoid PyTorch warning

#4636 opened Dec 7, 2025 by qgallouedec

Loading…

Fix KTOTrainer CUDA error for large-vocab models via tensor indexing

#4635 opened Dec 6, 2025 by bhuvanprakash

Loading…

TRL supports vLLM 0.11

#4633 opened Dec 6, 2025 by qgallouedec

Loading…

Preserve truncated tokens in BFD packing

#4632 opened Dec 5, 2025 by qgallouedec

Loading…

Update docs landing with latest details

#4624 opened Dec 4, 2025 by sergiopaniego

Loading…

6 tasks

[online-dpo] add vllm lora adapter support

#4590 opened Nov 27, 2025 by kashif

Loading…

5 tasks

🚚 Move KTO to trl.experimental

#4575 opened Nov 25, 2025 by neha222222

Loading…

Support async reward functions and parallelize call to reward functions.

#4567 opened Nov 24, 2025 by pramodith

Loading…

3 of 5 tasks

Add cross-tokenizer distillation support for GKD and MiniLLM trainers

#4561 opened Nov 22, 2025 by sambhavnoobcoder

Loading…

Add PSPO trust region method as alternative to clipping in GRPOTrainer

#4548 opened Nov 19, 2025 by MCDwyer

Loading…

2 of 5 tasks

fix: add vllm_group_port

#4545 opened Nov 19, 2025 by pointerhacker

Loading…

3 of 5 tasks

Add compute_metrics parameter for GRPOTrainer

#4534 opened Nov 17, 2025 by colinzhaoxp

Loading…

Make skip_special_tokens configurable

#4521 opened Nov 13, 2025 by taha-yassine

Loading…

3 of 5 tasks

[GRPO] switch grpo liger loss to triton version

#4519 opened Nov 13, 2025 by kashif

Loading…

1 of 8 tasks

adding [SimPER](https://arxiv.org/abs/2502.00883)

#4486 opened Nov 6, 2025 by leeparkuky

Loading…

2 of 5 tasks

added 10 papers (+trainer cross-links) for #4407

#4441 opened Nov 3, 2025 by SSusantAchary

Loading…

4 tasks done

docs: Unify model examples to use trl-lib namespace

#4431 opened Nov 2, 2025 by behroozazarkhalili

Loading…

[ALST/Ulysses] Added ALST/Ulysses documentation

#4420 opened Nov 1, 2025 by kashif • Draft

2 of 9 tasks

Use explicit tiny-Qwen2ForCausalLM-2.5 model_id param in CI tests

#4331 opened Oct 23, 2025 by albertvillanova

Loading…

refactor: simplify parameter freezing in modeling_base.py

#4305 opened Oct 20, 2025 by Ki-Seki

Loading…

2 of 5 tasks

🕵️‍♂️ Agent training

#4300 opened Oct 18, 2025 by qgallouedec

Loading…

Add CISPO loss option and documentation

#4298 opened Oct 16, 2025 by gustavorubim

Loading…

feat: Add Multi-Token Prediction (MTP) support to SFTTrainer

#4290 opened Oct 15, 2025 by KLGR123

Loading…

Remove FSDP1 support: use FSDP2 exclusively

#4260 opened Oct 11, 2025 by behroozazarkhalili

Loading…

Previous 1 2 3 4 Next

Previous Next

ProTip! Updated in the last three days: updated:>2025-12-04.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!