(WIP) A Little Bit of Reinforcement Learning from Human Feedback













(WIP) A Little Bit of Reinforcement Learning from Human Feedback


Acknowledgements

I would like to thank the following people who helped me directly with this project: Costa Huang, (and of course Claude). Indirect shout-outs go to Ross Taylor, Hamish Ivison, John Schulman, and others in my RL sphere.

Additionally, thank you to the contributors on GitHub who helped improve this project.


Citation

@book{rlhf2024,
  author = {Nathan Lambert},
  title = {Reinforcement Learning from Human Feedback},
  year = {2024},
  publisher = {Online},
  url = {https://rlhfbook.com},
}

© 2024 RLHF Book Team



Source link