News Score: Score the News, Sort the News, Rewrite the Headlines

Extracting memorized pieces of (copyrighted) books from open-weight language models

View PDF Abstract:Plaintiffs and defendants in copyright lawsuits over generative AI often make sweeping, opposing claims about the extent to which large language models (LLMs) have memorized plaintiffs' protected expression. Drawing on adversarial ML and copyright law, we show that these polarized positions dramatically oversimplify the relationship between memorization and copyright. To do so, we leverage a recent probabilistic extraction technique to extract pieces of the Books3 dataset from ...

Read more at arxiv.org

© News Score  score the news, sort the news, rewrite the headlines