Importance-Driven Adversarial Attacks via Music Inpainting
Scroll to explore
Exposing vulnerabilities in Music Information Retrieval systems
Music Information Retrieval (MIR) systems are increasingly deployed in commercial applications, from music recommendation to copyright detection. However, their robustness against adversarial attacks remains largely unexplored.
By selectively inpainting the most important music segments identified through Grad-CAM analysis, MAIA achieves higher attack success rates while maintaining superior perceptual quality compared to noise-based methods.
A three-step importance-driven adversarial inpainting framework
Identify critical time-frequency regions that most influence the model's decision
Uses gradient-weighted class activation mapping to locate influential segments with full model access
Iteratively queries the model to identify important regions through a hierarchical refinement process
Grad-CAM heatmap on mel-spectrogram
Select top-k most important segments for adversarial modification
Rank segments by importance scores and select the top-k regions that contribute most to classification
Hierarchically refine coarse segments into finer granularity to pinpoint precise attack locations
Selected important segments for inpainting
Reconstruct selected segments using GACELA with adversarial guidance
Generative Adversarial Context Encoder for Long Audio inpainting ensures musically coherent reconstruction
Balance reconstruction quality and attack effectiveness through weighted loss combination
Before Inpainting
After Inpainting
Adversarial inpainting maintains audio quality
Listen to adversarial attacks in action
Click on regions to jump to that time in the audio
MAIA outperforms existing adversarial attack methods
| Method | ASR ↑ | mAP ↓ | FAD ↓ | LSD ↓ | MOS ↑ |
|---|---|---|---|---|---|
| White-Box Attacks (CSI) | |||||
| PGD | 82.1% | 0.619 | 12.64 | 2.10 | 3.1 |
| C&W | 88.5% | 0.560 | 12.11 | 1.94 | 3.4 |
| MAIA-WB | 92.8% | 0.488 | 11.25 | 1.58 | 4.0 |
| Black-Box Attacks (CSI) | |||||
| NES | 70.2% | 0.682 | 13.93 | 2.27 | 2.8 |
| ZOO | 74.9% | 0.639 | 13.51 | 2.12 | 3.0 |
| MAIA-BB | 80.1% | 0.594 | 12.56 | 1.90 | 3.6 |
Test robustness of music copyright detection systems against adversarial attacks
Evaluate security vulnerabilities in deployed MIR models before production
Assess privacy risks in music generation and recommendation systems
@inproceedings{maia2025,
title={MAIA: Music Adversarial Inpainting Attack},
author={Liu, Yuxuan and Zhang, Peihong and Sang, Rui and Li, Zhixin and Li, Shengchen},
booktitle={Proceedings of the International Society for Music Information Retrieval Conference},
year={2025}
}
Authors: Yuxuan Liu, Peihong Zhang, Rui Sang, Zhixin Li, Shengchen Li
Institution: Xi'an Jiaotong-Liverpool University
Email: shengchen.li@xjtlu.edu.cn