GRPO:Zero implements Group Relative Policy Optimization for training LLMs with minimal dependencies,...

Tokens:488
Snippets:6
Trust Score:5
License:Apache-2.0
Update:1 year ago
Tokens:
Raw