They replaced the training data with an evaluator. (which rates the LLMs output for training?) Interesting, thanks.
Edit: this reminds me of the self evolving (virtual) robot problem, a robot which is rated by an external moderator and improves over time. I.e.: https://www.sciencedirect.com/science/article/pii/S0925231221003982
This is a most excellent place for technology news and articles.
They replaced the training data with an evaluator. (which rates the LLMs output for training?) Interesting, thanks.
Edit: this reminds me of the self evolving (virtual) robot problem, a robot which is rated by an external moderator and improves over time. I.e.: https://www.sciencedirect.com/science/article/pii/S0925231221003982