MAIA

Music Adversarial Inpainting Attack

Importance-Driven Adversarial Attacks via Music Inpainting

ISMIR 2025 Yuxuan Liu, Peihong Zhang, Rui Sang, Zhixin Li, Shengchen Li

Scroll to explore

Why MAIA?

Exposing vulnerabilities in Music Information Retrieval systems

The Problem

Music Information Retrieval (MIR) systems are increasingly deployed in commercial applications, from music recommendation to copyright detection. However, their robustness against adversarial attacks remains largely unexplored.

92.8%
White-box ASR
80.1%
Black-box ASR
4.0/5
Perceptual Quality

Attack Approaches

🔊

Noise-Based

  • Adds imperceptible perturbations
  • Lacks interpretability
  • Ignores music structure
  • High-frequency artifacts
Traditional
🎵

MAIA (Ours)

  • Inpainting-based modifications
  • Preserves musical coherence
  • Importance-driven targeting
  • Natural audio quality
Novel
💡

Key Insight

By selectively inpainting the most important music segments identified through Grad-CAM analysis, MAIA achieves higher attack success rates while maintaining superior perceptual quality compared to noise-based methods.

How MAIA Works

A three-step importance-driven adversarial inpainting framework

01

Importance Analysis

Identify critical time-frequency regions that most influence the model's decision

White-box: Grad-CAM

Uses gradient-weighted class activation mapping to locate influential segments with full model access

Mc(x,y) = ReLU(Σk αkc Fkl(x,y))

Black-box: Coarse-to-Fine

Iteratively queries the model to identify important regions through a hierarchical refinement process

I(Ci) = [L(M(x̃-Ci), y) - L(M(x), y)] / duration(Ci)
Grad-CAM heatmap on mel-spectrogram

Grad-CAM heatmap on mel-spectrogram

02

Segment Selection

Select top-k most important segments for adversarial modification

Priority-based Selection

Rank segments by importance scores and select the top-k regions that contribute most to classification

Refinement Strategy

Hierarchically refine coarse segments into finer granularity to pinpoint precise attack locations

Selected important segments for inpainting

Selected important segments for inpainting

03

Adversarial Inpainting

Reconstruct selected segments using GACELA with adversarial guidance

GACELA Inpainting Model

Generative Adversarial Context Encoder for Long Audio inpainting ensures musically coherent reconstruction

Loss Function

Balance reconstruction quality and attack effectiveness through weighted loss combination

L = λrec Lrec + λatt Lattack
Before Inpainting

Before Inpainting

After Inpainting

After Inpainting

Adversarial inpainting maintains audio quality

Interactive Demo

Listen to adversarial attacks in action

Original Audio

0:00 / 0:00

Adversarial Audio

0:00 / 0:00

Original Prediction

Blues
89%

Adversarial Prediction

Jazz
67%

Attack Status

Success

Perceptual Metrics

FAD: 11.25
LSD: 1.58
MOS: 4.0/5

Inpainting Regions

5.0-5.4s
12.0-12.3s
18.0-18.5s
0s 10s 20s 30s

Click on regions to jump to that time in the audio

Results

MAIA outperforms existing adversarial attack methods

MAIA White-Box

Best
92.8%
Attack Success Rate
mAP Degradation: 0.845 → 0.488
FAD: 11.25
MOS: 4.0/5

Baselines (WB)

PGD / C&W
82-88%
Attack Success Rate
mAP Degradation: 0.560 - 0.619
FAD: 12.11 - 12.64
MOS: 3.1 - 3.4/5

MAIA Black-Box

Best
80.1%
Attack Success Rate
mAP Degradation: 0.845 → 0.594
FAD: 12.56
MOS: 3.6/5

Attack Success Rate Comparison

Perceptual Quality vs. Attack Success

Detailed Performance Metrics

Method ASR ↑ mAP ↓ FAD ↓ LSD ↓ MOS ↑
White-Box Attacks (CSI)
PGD 82.1% 0.619 12.64 2.10 3.1
C&W 88.5% 0.560 12.11 1.94 3.4
MAIA-WB 92.8% 0.488 11.25 1.58 4.0
Black-Box Attacks (CSI)
NES 70.2% 0.682 13.93 2.27 2.8
ZOO 74.9% 0.639 13.51 2.12 3.0
MAIA-BB 80.1% 0.594 12.56 1.90 3.6

Resources & Impact

Applications

🛡️

Copyright Protection

Test robustness of music copyright detection systems against adversarial attacks

🔍

Model Auditing

Evaluate security vulnerabilities in deployed MIR models before production

🔒

Privacy Evaluation

Assess privacy risks in music generation and recommendation systems

Citation

@inproceedings{maia2025,
  title={MAIA: Music Adversarial Inpainting Attack},
  author={Liu, Yuxuan and Zhang, Peihong and Sang, Rui and Li, Zhixin and Li, Shengchen},
  booktitle={Proceedings of the International Society for Music Information Retrieval Conference},
  year={2025}
}

Contact

Authors: Yuxuan Liu, Peihong Zhang, Rui Sang, Zhixin Li, Shengchen Li

Institution: Xi'an Jiaotong-Liverpool University

Email: shengchen.li@xjtlu.edu.cn