Detecting Training Text in Language Models

Results

We find that, for the models we tested, it is possible to detect with high accuracy whether a passage of text was in the training set of a language model. Our main results are as follows:

For more details, see the full report and methodology.

Related Papers

Summary: Copyright Traps for Large Language Models

The paper Copyright Traps for Large Language Models addresses the ongoing debate about the fair use of copyright-protected content in training large language models (LLMs). The authors explore "document-level inference," which aims to determine if a specific piece of content was present in a model's training data using only black-box access. While state-of-the-art methods rely on natural memorization, the paper investigates the use of deliberately inserted "copyright traps"—unique sequences embedded in training data—to detect unauthorized use. The study finds that medium-length traps repeated a moderate number of times are not reliably detectable, but longer sequences repeated many times can be detected and used as effective copyright traps. This has significant implications for copyright enforcement and auditing of LLMs.

Copyright Traps

Datasets Used for Building LLMs

For a detailed overview of various sources and datasets used for training large language models (LLMs), please visit our dedicated wiki page: Datasets Used for Building LLMs. This resource describes the types of data, their provenance, and considerations for responsible AI development.

Responsible AI Conferences

For a curated and regularly updated list of major conferences focused on responsible AI—including policy, governance, ethics, fairness, and domain-specific applications—please visit our dedicated wiki page: Responsible AI Conferences. This resource highlights key events, dates, locations, and opportunities for engagement in the responsible AI community.