close
Skip to content

Pull requests: huggingface/trl

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Assigned to nobody Loading
Sort

Pull requests list

Reject parallelism_config with cp_size>1 or sp_size>1 in GRPO/RLOO
#5699 opened May 4, 2026 by kashif Collaborator Loading…
8 tasks
Add MFU helpers
#5698 opened May 4, 2026 by AmineDiro Member Loading…
1 task done
[experimental] Add OpenReward Standard environment adapter
#5696 opened May 3, 2026 by adithya-s-k Collaborator Loading…
3 of 6 tasks
Final logits softcapping support for async GRPO Trainer
#5691 opened May 2, 2026 by mlarnouhet Loading…
3 of 8 tasks
Enable chunked NLL loss with VLM in SFT
#5684 opened Apr 29, 2026 by qgallouedec Member Loading…
Reduce inconsistency across trainer test files
#5678 opened Apr 29, 2026 by qgallouedec Member Loading…
Enable chunked NLL loss with PEFT in SFT
#5676 opened Apr 28, 2026 by qgallouedec Member Loading…
feat(vllm-serve): add --reasoning-parser and --reasoning-config flags
#5672 opened Apr 28, 2026 by kfirah-create Loading…
2 of 8 tasks
DeepSeek v4
#5641 opened Apr 25, 2026 by qgallouedec Member Draft
8 tasks
Align tiny-Glm4MoeForCausalLM with GLM-4.5 reference config
#5638 opened Apr 24, 2026 by qgallouedec Member Loading…
8 tasks
Refactor tiny-model generation scripts
#5637 opened Apr 24, 2026 by qgallouedec Member Loading…
Add LoRA support for AsyncGRPO
#5610 opened Apr 21, 2026 by jonahsamost Loading…
2 of 8 tasks
experimental: Self-Distillation Zero
#5609 opened Apr 20, 2026 by LeonEricsson Collaborator Loading…
1 of 8 tasks
support prefetch/prefetch_depth for async GRPO for ~5% speedups
#5602 opened Apr 20, 2026 by winglian Contributor Loading…
1 of 8 tasks
Fix nested vocab_size for DistillationTrainer and GOLDTrainer
#5592 opened Apr 19, 2026 by Beichen-Ma Loading…
2 of 8 tasks
feat: add TargetPO trainer
#5591 opened Apr 18, 2026 by JeanKaddour Draft
4 of 8 tasks
Add training chat template for Qwen3-2507
#5574 opened Apr 16, 2026 by SwayamInSync Contributor Loading…
refactor: self distillation trainers (sdpo/sdft/...)
#5573 opened Apr 16, 2026 by LeonEricsson Collaborator Loading…
2 of 8 tasks
Improve BrowserGym examples for latest OpenEnv version
#5568 opened Apr 16, 2026 by sergiopaniego Member Loading…
8 tasks
Revert VLM support in parse_response
#5561 opened Apr 15, 2026 by qgallouedec Member Loading…
Accept processor in get_training_chat_template
#5560 opened Apr 15, 2026 by qgallouedec Member Loading…
Move experimental example scripts into their trainer folders
#5556 opened Apr 15, 2026 by sergiopaniego Member Loading…
1 of 8 tasks
ProTip! Follow long discussions with comments:>50.