Recently, nan investigation squad led by Prof. Li Hai astatine nan Institute of Health and Medical Technology, nan Hefei Institutes of Physical Science of nan Chinese Academy of Sciences, has developed a caller heavy learning model that importantly improves nan accuracy and interpretability of detecting neurological disorders done speech.
"A flimsy alteration successful nan measurement we speak mightiness beryllium much than conscionable a gaffe of nan tongue-it could beryllium a informing motion from nan brain," said Prof. Li Hai, who led nan team, "Our caller exemplary tin observe early symptoms of neurological diseases for illustration Parkinson' s, Huntington' s, and Wilson disease-by analyzing sound recordings."
The study was precocious published in Neurocomputing.
Dysarthria is simply a communal early denotation of various neurological disorders. Given that these reside abnormalities often bespeak underlying neurodegenerative processes, sound signals person emerged arsenic promising non-invasive biomarkers for early screening and continuous monitoring of specified conditions. Automated reside study offers precocious efficiency, debased cost, and non-invasiveness. However, existent mainstream methods often suffer from over-reliance connected handcrafted features, constricted capacity to exemplary temporal-variable interactions, and mediocre interpretability.
To reside these challenges, nan squad projected Cross-Time and Cross-Axis Interactive Transformer (CTCAIT) for multivariate clip bid analysis. This model first employs a large-scale audio exemplary to extract high-dimensional temporal features from speech, representing them arsenic multidimensional embeddings on clip and characteristic axes. It past leverages nan Inception Time web to seizure multi-scale and multi-level patterns wrong nan clip series. By integrating cross-time and cross-channel multi-head attraction mechanisms, CTCAIT efficaciously captures pathological reside signatures embedded crossed different dimensions.
The method achieved a discovery accuracy of 92.06% connected a Mandarin Chinese dataset and 87.73% connected an outer English dataset, demonstrating beardown cross-linguistic generalizability.
Furthermore, nan squad conducted interpretability analyses of nan model's soul decision-making processes and systematically compared nan effectiveness of different reside tasks, offering valuable insights for its imaginable objective deployment.
These efforts supply important guidance for imaginable objective applications of nan method successful early test and monitoring of neurological disorders.
Source:
Journal reference:
Zhang, Z., et al. (2025). Multivariate clip bid attack integrating cross-temporal and cross-channel attraction for dysarthria discovery from speech. Neurocomputing. doi.org/10.1016/j.neucom.2025.130708