Scaling Up Membership Inference: When and How Attacks Succeed on Large Language Models

1Parameter Lab 2Ubiquitous Knowledge Processing Lab, Technical University of Darmstadt 3NAVER AI Lab 4University of Tübingen 5Tübingen AI Center
Parameter Lab Logo NAVER Logo
TLDR: Membership inference attacks (MIA) on large language models (LLMs) have been deemed ineffective, but we show they can succeed when applied at larger scales, such as document or collection levels.

Intro

Abstract

Membership inference attacks (MIA) attempt to verify whether specific data was used to train a model. With the rise of large language models (LLMs) and concerns about copyrighted training materials, detecting such usage has become increasingly important. While previous research suggested MIA methods were ineffective on LLMs, we demonstrate their viability when applied at larger scales.

We construct new benchmarks that evaluate MIA performance across different scales - from individual sentences to collections of documents. By adapting recent Dataset Inference (DI) techniques, we develop an approach that aggregates paragraph-level MIA features to enable detection at document and collection levels.

Our work achieves the first successful membership inference attacks on both pre-trained and fine-tuned LLMs. These results challenge previous conclusions about MIA ineffectiveness and demonstrate that such attacks can succeed when multiple documents are analyzed together rather than in isolation.

Multi-Scale Evaluation of MIA

We evaluate MIA at four distinct scales: sentence, paragraph, document, and collection. At the sentence level (avg. 43 tokens), MIA helps detect contamination in benchmarks and privacy leakage, though success is challenging due to high overlap between member/non-member sentences. Paragraph-level MIA operates within model context windows (512-2048 tokens) and is relevant for social media content. Document-level MIA targets full texts like research papers (avg. 14,222 tokens), requiring chunking into paragraphs and aggregation of signals. This scale is crucial for copyright concerns around articles and books. Finally, collection-level MIA examines sets of documents (e.g., 100 documents ≈ 1.4M tokens), important for detecting if entire datasets were used in training. Our results show that MIA achieves the strongest performance at document and collection scales, which is particularly relevant as copyright disputes often center on complete articles rather than fragments.

Different scales of MIA evaluation

We ran experiments using Pythia models (2.8B and 6.9B parameters) with training samples from The Pile dataset, comparing them to validation and test sets.

MIA is Effective at the Right Scale

Our experiments demonstrate that MIA effectiveness increases with scale. While sentence and paragraph-level attacks show limited success, document and collection-level attacks achieve much stronger performance.

Benchmark results showing MIA effectiveness at different scales

The key to make MIA work on LLMs is to aggreate MIA scores across a large enough number of tokens. If the MIA performance at paragraph level (the base unit to aggregate) is better than random chance, and we have enough text units to aggregate (i.e., long enough documents and large enough collections of documents), the aggregation of signals allows to classify membership with high confidence as shown in the figures bellow.

Aggregation approach for document-level MIA

However, if the paragraph-MIA AUROC is too low or the amount of information to aggregate is too short, MIA will not work, as shown in the Figure below.

Cases where MIA aggregation does not work

Fine-tuning Amplifies the Effectiveness of MIA

Lastly, we test whether MIA could be use test data leaks in evaluation benchmarks. To do so, we use a fine-tuned Phi-2 on multiple question answering datasets and check the performance of MIA to detect membership of the training data questions. In the table below, we see that MIA is very effective even at sentence level and almost 100% effective for small collections of questions.

MIA effectiveness on fine-tuned models

BibTeX

Consider citing us if your find our work relevant.
@misc{puerto2024scalingmembershipinferenceattacks,
        title={Scaling Up Membership Inference: When and How Attacks Succeed on Large Language Models}, 
        author={Haritz Puerto and Martin Gubri and Sangdoo Yun and Seong Joon Oh},
        year={2024},
        eprint={2411.00154},
        archivePrefix={arXiv},
        primaryClass={cs.CL},
        url={https://arxiv.org/abs/2411.00154}, 
  }
      

Want to learn more?

Check out the community page of MIA in ResearchTrend.AI