To alleviate this issue, we formulate the BATON, the first framework specifically designed to enhance the alignment between generated audio and text prompt using human preference feedback. Our BATON ...