Fusing Reward and Dueling Feedback in Stochastic Bandits
Published in ICML, 2025
This paper proposes novel algorithms (ElimFusion and DecoFusion) to fuse absolute reward and relative dueling feedback in multi-armed bandits, achieving regret bounds that adaptively leverage the more informative feedback type. Theoretical and empirical results demonstrate significant performance gains over baselines.