Comparative Study:OPT-350M and GPT-2 w Reward-based Training Collection Comparative Study: Training OPT-350M and GPT-2 on Anthropic’s HH-RLHF Dataset Using Reward-Based Training • 2 items • Updated Sep 11, 2023