PhD Annual Report 2024-2025

PhD Research Journey

AI Security in Music Generation

A comprehensive research trajectory exploring the security landscape of music generation models—from understanding vulnerabilities through adversarial attacks, to developing defense mechanisms, to analyzing privacy implications through membership inference.

Research Narrative

Core Theme

Securing Music Generation Models: From Understanding Vulnerabilities to Defense and Privacy

My PhD research establishes a comprehensive framework for understanding and addressing security challenges in music generation models. The work progresses through three interconnected phases, each building upon insights from the previous stage to form a complete security lifecycle for music AI systems.

This research represents the first systematic investigation of adversarial robustness and privacy in the music domain, contributing novel attack methodologies, defense mechanisms, and inference techniques that bridge computer security and music information retrieval.

Research Progression

Phase 1: Understanding Vulnerabilities MAIA ISMIR 2025 (Accepted) Motivated by attack findings Phase 2: Defense & Evaluation Perceptual Model CMMR 2025 (Accepted) Extended to privacy analysis Phase 3: Privacy & Membership Inference Dual-Domain Analysis LSA-Probe Waveform Domain ICASSP (Under Review) TS-RaMIA Symbolic Domain AAAI Workshop (Under Review)

Research Publications

Phase 1: Attack
ISMIR 2025 (Accepted)

Music Adversarial Inpainting Attack (MAIA)

Adversarial attacks targeting specific segments of music generation models through selective inpainting, revealing critical vulnerabilities in regional model behaviors.

Key Contribution: First work to demonstrate regional adversarial attacks on music generation with Grad-CAM visualization for interpretable attack localization.
Role in Thesis: Establishes the attack surface and demonstrates that music generation models are vulnerable to targeted manipulation, motivating the need for defense mechanisms.
View Demo →
Phase 2: Defense
CMMR 2025 (Accepted)

Training a Perceptual Model for Evaluating Auditory Similarity in Music Adversarial Attack

Perceptual evaluation framework for assessing similarity in adversarial music generation, bridging the gap between automated attack detection and human auditory perception.

Key Contribution: Novel training methodology for perceptual models that aligns machine evaluation with human perception of adversarial perturbations in music.
Role in Thesis: Directly motivated by MAIA findings, develops defense capability by enabling accurate perceptual assessment of attack impact, completing the attack-defense cycle.
Paper →
Phase 3: Privacy (Waveform)
ICASSP (Under Review)

LSA-Probe: Membership Inference via Latent Stability Analysis for Music Diffusion Models

Membership inference attack on audio diffusion models using latent space perturbation stability, revealing privacy vulnerabilities in waveform-based music generation.

Key Contribution: Two-loop adversarial probing framework that reveals training data membership through analyzing latent stability patterns across diffusion timesteps.
Role in Thesis: Extends security analysis from adversarial robustness to privacy, demonstrating that attack insights transfer to membership inference in the waveform domain.
View Demo →
Phase 3: Privacy (Symbolic)
AAAI Workshop (Under Review)

TS-RaMIA: Time-Series Randomized Membership Inference Attack

Structural pattern analysis for membership inference in symbolic music generation models, exploiting time-series characteristics unique to ABC notation.

Key Contribution: First MIA framework for symbolic music using structural tokens (StructTail) and meta-fusion, achieving superior performance over baseline methods.
Role in Thesis: Completes the dual-domain privacy analysis by addressing symbolic representation, demonstrating that privacy vulnerabilities exist across both audio and symbolic modalities.
View Demo →

Thesis Coherence

🔍

Attack → Defense

MAIA's discovery of regional vulnerabilities directly motivated the development of perceptual evaluation models. The attack insights informed defense strategy design.

🛡️

Defense → Privacy

Understanding attack-defense dynamics revealed the need to analyze training data privacy. Defense mechanisms inspired membership inference methodologies.

🔐

Dual-Domain Analysis

Privacy investigation extended across both waveform and symbolic domains, demonstrating that security principles apply universally across music representations.

Complete Security Lifecycle

This research establishes the first comprehensive security framework for music generation models, covering:

  • Attack Surface Analysis: Understanding how models can be compromised
  • Defense Mechanisms: Developing tools to detect and evaluate perturbations
  • Privacy Implications: Analyzing training data leakage across modalities

Together, these contributions form a cohesive PhD thesis that advances both the security and music AI communities, providing theoretical foundations and practical tools for securing next-generation music generation systems.