v0.3.0: Full-Parameter RLHF

hiyouga released this 16 Nov 08:24

· 2053 commits to main since this release

c4facc0

New features

Support full-parameter RLHF training (RM & PPO)
Refactor llmtuner core in #1525 by @hiyouga
Better LLaMA Board: full-parameter RLHF and demo mode

New models

Base models
- ChineseLLaMA-1.3B
- LingoWhale-8B
Instruct/Chat models
- ChineseAlpaca-1.3B
- Zephyr-7B-Alpha/Beta

Bug fix

Fix bugs in partial-parameter (freeze) tuning
Fix #224 #336 #931 #936 #1011 #1489 #1494 #1507 #1514

Contributors

hiyouga

Assets 2