Train for Truth, Keep the Skills: Binary Retrieval-Augmented Reward Mitigates Hallucinations
View PDF
HTML (experimental)
Abstract:Language models often generate factually incorrect information unsupported by their training data, a phenomenon known as extrinsic hallucination. Existing mitigation approaches often degrade performance on open-ended generation and downstream tasks, limiting their practical utility. We propose an online reinforcement learning method using a novel binary retrieval-augmented reward (RAR) to address this tradeoff. Unlike continuous reward schemes, our approach ...
Read more at arxiv.org