(WIP) A Little Bit of Reinforcement Learning from Human Feedback

Posted on February 2, 2025 by oxm6k

Acknowledgements

I would like to thank the following people who helped me directly with this project: Costa Huang, (and of course Claude). Indirect shout-outs go to Ross Taylor, Hamish Ivison, John Schulman, and others in my RL sphere.

Additionally, thank you to the contributors on GitHub who helped improve this project.

Source link

Acknowledgements

Leave a Reply Cancel reply