Fusing Reward and Dueling Feedback in Stochastic Bandits
Published in ICML, 2025
This paper proposes novel algorithms (ElimFusion and DecoFusion) to fuse absolute reward and relative dueling feedback in multi-armed bandits, achieving regret bounds that adaptively leverage the more informative feedback type. Theoretical and empirical results demonstrate significant performance gains over baselines.
Recommended citation: Wang, X. et al. (2025). "Fusing Reward and Dueling Feedback in Stochastic Bandits." arXiv preprint arXiv:2504.15812.
Download Paper