MAIA: Music Adversarial Inpainting Attack

Why MAIA?

Exposing vulnerabilities in Music Information Retrieval systems

The Problem

Music Information Retrieval (MIR) systems are increasingly deployed in commercial applications, from music recommendation to copyright detection. However, their robustness against adversarial attacks remains largely unexplored.

92.8%

White-box ASR

80.1%

Black-box ASR

4.0/5

Perceptual Quality

Attack Approaches

🔊

Noise-Based

Adds imperceptible perturbations
Lacks interpretability
Ignores music structure
High-frequency artifacts

Traditional

🎵

MAIA (Ours)

Inpainting-based modifications
Preserves musical coherence
Importance-driven targeting
Natural audio quality

Novel

💡

Key Insight

By selectively inpainting the most important music segments identified through Grad-CAM analysis, MAIA achieves higher attack success rates while maintaining superior perceptual quality compared to noise-based methods.

How MAIA Works

A three-step importance-driven adversarial inpainting framework

Importance Analysis

Identify critical time-frequency regions that most influence the model's decision

White-box: Grad-CAM

Uses gradient-weighted class activation mapping to locate influential segments with full model access

M_c(x,y) = ReLU(Σ_k α_k^c F_k^l(x,y))

Black-box: Coarse-to-Fine

Iteratively queries the model to identify important regions through a hierarchical refinement process

I(C_i) = [L(M(x̃_{-C_i}), y) - L(M(x), y)] / duration(C_i)

Grad-CAM heatmap on mel-spectrogram

Segment Selection

Select top-k most important segments for adversarial modification

Priority-based Selection

Rank segments by importance scores and select the top-k regions that contribute most to classification

Refinement Strategy

Hierarchically refine coarse segments into finer granularity to pinpoint precise attack locations

Selected important segments for inpainting

Adversarial Inpainting

Reconstruct selected segments using GACELA with adversarial guidance

GACELA Inpainting Model

Generative Adversarial Context Encoder for Long Audio inpainting ensures musically coherent reconstruction

Loss Function

Balance reconstruction quality and attack effectiveness through weighted loss combination

L = λ_rec L_rec + λ_att L_attack

Before Inpainting

→

After Inpainting

Adversarial inpainting maintains audio quality

Interactive Demo

Listen to adversarial attacks in action

Select Audio Sample:

Original Audio

0:00 / 0:00

Adversarial Audio

0:00 / 0:00

Original Prediction

Blues

89%

Adversarial Prediction

Jazz

67%

Attack Status

✅ Success

Perceptual Metrics

FAD: 11.25

LSD: 1.58

MOS: 4.0/5

Inpainting Regions

0s 10s 20s 30s

Click on regions to jump to that time in the audio

Results

MAIA outperforms existing adversarial attack methods

MAIA White-Box

Best

92.8%

Attack Success Rate

mAP Degradation: 0.845 → 0.488

FAD: 11.25

MOS: 4.0/5

Baselines (WB)

PGD / C&W

82-88%

Attack Success Rate

mAP Degradation: 0.560 - 0.619

FAD: 12.11 - 12.64

MOS: 3.1 - 3.4/5

MAIA Black-Box

Best

80.1%

Attack Success Rate

mAP Degradation: 0.845 → 0.594

FAD: 12.56

MOS: 3.6/5

Attack Success Rate Comparison

Perceptual Quality vs. Attack Success

Detailed Performance Metrics

Method	ASR ↑	mAP ↓	FAD ↓	LSD ↓	MOS ↑
White-Box Attacks (CSI)
PGD	82.1%	0.619	12.64	2.10	3.1
C&W	88.5%	0.560	12.11	1.94	3.4
MAIA-WB	92.8%	0.488	11.25	1.58	4.0
Black-Box Attacks (CSI)
NES	70.2%	0.682	13.93	2.27	2.8
ZOO	74.9%	0.639	13.51	2.12	3.0
MAIA-BB	80.1%	0.594	12.56	1.90	3.6

Resources & Impact

Applications

🛡️

Copyright Protection

Test robustness of music copyright detection systems against adversarial attacks

🔍

Model Auditing

Evaluate security vulnerabilities in deployed MIR models before production

🔒

Privacy Evaluation

Assess privacy risks in music generation and recommendation systems

Code & Paper

💻

GitHub Repository

Anonymous submission - Code and models

→

📄

Paper (ISMIR 2025)

Full technical details and experiments

→

Citation

@inproceedings{maia2025,
  title={MAIA: Music Adversarial Inpainting Attack},
  author={Liu, Yuxuan and Zhang, Peihong and Sang, Rui and Li, Zhixin and Li, Shengchen},
  booktitle={Proceedings of the International Society for Music Information Retrieval Conference},
  year={2025}
}

Contact

Authors: Yuxuan Liu, Peihong Zhang, Rui Sang, Zhixin Li, Shengchen Li

Institution: Xi'an Jiaotong-Liverpool University

Email: shengchen.li@xjtlu.edu.cn