Semi-Supervised Speech Embedding Fusion for Parkinson’s Detection
We developed a novel fusion architecture that combines semi-supervised speech embeddings to detect Parkinson’s Disease (PD) using natural speech recordings collected from over 1,300 participants in both home and clinical environments.
- Leveraged deep speech embeddings (Wav2Vec 2.0, WavLM, ImageBind) to capture rich vocal features indicative of PD, moving beyond traditional handcrafted features.
- Designed a fusion model that projects and aligns multi-model speech embeddings into a unified feature space, improving classification performance relative to baseline approaches.
- Achieved high classification accuracy (AUROC ≈ 88.9%, accuracy ≈ 85.7%) on internal evaluation and demonstrated generalizability on external clinical datasets.
- Conducted detailed bias and robustness analyses showing equitable performance across sex, ethnicity, and disease stages, supporting broader real-world applicability.
