(WIP) A Little Bit of Reinforcement Learning from Human Feedback
Acknowledgements
I would like to thank the following people who helped me directly with this project: Costa Huang, (and of course Claude). Indirect shout-outs go to Ross Taylor, Hamish Ivison, John Schulman, and others in my RL sphere.
Additionally, thank you to the contributors on GitHub who helped improve this project.