Adversarial perturbations that prevent diffusion models from learning your music style
Protect your music from being learned and imitated by diffusion models through training-time adversarial perturbations that systematically misalign CLAP embeddings, breaking the condition-sample association in latent diffusion models.
A visual explanation of how adversarial perturbations make music "unlearnable" by diffusion models
Problem: Music diffusion models (e.g., MusicLDM) can learn and imitate musical styles from training data, raising concerns about copyright protection and creative ownership.
Goal: Develop a training-time defense mechanism that prevents diffusion models from effectively learning specific musical styles without degrading perceptual quality.
Core Idea: Inject imperceptible adversarial perturbations δ into music samples before training, causing CLAP (Contrastive Language-Audio Pretraining) embeddings to shift systematically. This breaks the alignment between conditions (embeddings) and samples, preventing the LDM from learning the true style association.
MusicLDM relies on CLAP embeddings as conditional inputs to guide the diffusion process. During training, the model learns associations between embeddings e and samples x.
By perturbing samples to x' = x + δ, we cause CLAP to produce shifted embeddings e' = CLAP(x'). The model then learns the wrong association: (x', e') instead of (x, e).
This systematic misalignment means that even if an attacker tries to generate music using the original embedding e, the model cannot reproduce the true style because it was trained on misaligned pairs.
Key Insight: CLAP acts as a "bridge" between text/audio conditions and the diffusion model. By perturbing this bridge, we break the learning pathway without affecting human perception.
Interactive visualization showing how adversarial perturbations affect the CLAP embedding space and disrupt the learning process in MusicLDM.
Objective: Maximize Δe = ||CLAP(x + δ) - CLAP(x)||₂
Subject to:
- Perceptual constraint: ||δ||_p ≤ ε (energy bound)
- Perceptual quality: D(x, x') ≤ τ (perceptual threshold)
Optimization Strategy:
1. Initialize δ ~ N(0, σ²)
2. For each iteration:
a. Compute gradient: ∇_δ ||CLAP(x + δ) - CLAP(x)||₂
b. Update: δ ← δ + α · sign(∇_δ)
c. Project: δ ← clip(δ, -ε, ε)
3. Return optimal δ*
To validate that protected samples prevent effective learning, we plan to evaluate:
Measure how well models trained on protected data can imitate the original style using CLAP similarity scores and perceptual metrics.
Quantify the embedding shift Δe and verify that it exceeds a threshold while maintaining perceptual quality.
Ensure perturbations remain imperceptible through listening tests (MOS) and objective metrics (PESQ, STOI).
Test against various preprocessing (resampling, compression) and verify protection persists across different diffusion model architectures.
This method relies on CLAP as the conditioning mechanism. Models using different conditioning approaches (e.g., direct text-to-audio) may not be affected.
Strong preprocessing (e.g., aggressive resampling, filtering) might remove perturbations. Protection effectiveness depends on maintaining perturbation integrity.
Protection is only effective if applied before training. Once a model is trained on unprotected data, this method cannot retroactively protect it.
There exists a trade-off between protection strength and perceptual quality. Very strong perturbations may become perceptible, while weak ones may not provide sufficient protection.