Discussion about this post

User's avatar
Daniel Popescu / ⧉ Pluralisk's avatar

Excellent analysis, this really breaks down the magic of RLHF! It makes me wonder what the next step are for defining 'succesful outcomes' beyond basic rewards, especially when we consider long-term ethical alignment for AI – you've laid out the concepts so clearly for us.

No posts

Ready for more?