Researchers Reverse-Engineer LLMs to Extract Structured Datasets, Unveiling Compressed Knowledge

LLM-Deflate: Extracting LLMs Into Datasets

Skip to main content LLM-Deflate: Extracting LLMs Into Datasets Sep 19, 2025 — Greg Diamos LLM-Deflate: Extracting LLMs Into Datasets Large Language Models compress massive amounts of training data into their parameters. This compression is lossy but highly effective—billions of parameters can encode the essential patterns from terabytes of text. However, what’s less obvious is that this process can be reversed: we can systematically extract structured datasets from trained models that reflect t...