Detecting Training Text in Language Models

Results

We find that, for the models we tested, it is possible to detect with high accuracy whether a passage of text was in the training set of a language model. Our main results are as follows:

Detection accuracy: For GPT-2, GPT-3, and GPT-4, we can detect whether a passage was in the training set with over 95% accuracy using a simple classifier based on model loss.
Loss-based detection: Passages that were in the training set have significantly lower loss (i.e., are more predictable) than passages that were not in the training set. This difference is robust across models and datasets.
Generalization: The detection method generalizes across different types of text and is not limited to specific domains or genres.
Implications: These results suggest that it is feasible to audit language models for the presence of specific training data, which has implications for copyright, privacy, and responsible AI development.

For more details, see the full report and methodology.

Summary: Copyright Traps for Large Language Models

The paper Copyright Traps for Large Language Models addresses the ongoing debate about the fair use of copyright-protected content in training large language models (LLMs). The authors explore "document-level inference," which aims to determine if a specific piece of content was present in a model's training data using only black-box access. While state-of-the-art methods rely on natural memorization, the paper investigates the use of deliberately inserted "copyright traps"—unique sequences embedded in training data—to detect unauthorized use. The study finds that medium-length traps repeated a moderate number of times are not reliably detectable, but longer sequences repeated many times can be detected and used as effective copyright traps. This has significant implications for copyright enforcement and auditing of LLMs.

Copyright Traps

Meeus, Matthieu et al. (2024). Copyright Traps for Large Language Models. PDF: arXiv:2402.09363

Datasets Used for Building LLMs

For a detailed overview of various sources and datasets used for training large language models (LLMs), please visit our dedicated wiki page: Datasets Used for Building LLMs. This resource describes the types of data, their provenance, and considerations for responsible AI development.

Responsible AI Conferences