[LG] AutoRule: Reasoning Chain-of-thought Extracted Rule-based Rewards Improve Preference Learning
[CMU]
https://arxiv.org/abs/2506.15651