News Score: Score the News, Sort the News, Rewrite the Headlines

AbsenceBench: Language Models Can't Tell What's Missing

View PDF HTML (experimental) Abstract:Large language models (LLMs) are increasingly capable of processing long inputs and locating specific information within them, as evidenced by their performance on the Needle in a Haystack (NIAH) test. However, while models excel at recalling surprising information, they still struggle to identify clearly omitted information. We introduce AbsenceBench to assesses LLMs' capacity to detect missing information across three domains: numerical sequences, poetry, ...

Read more at arxiv.org

© News Score  score the news, sort the news, rewrite the headlines