Preliminary Evaluation of Automated Speech Recognition Apps for the Hearing Impaired and Deaf

Pragt, Leontien and van Hengel, Peter and Grob, Dagmar and Wasmann, Jan-Willem A. (2022) Preliminary Evaluation of Automated Speech Recognition Apps for the Hearing Impaired and Deaf. Frontiers in Digital Health, 4. ISSN 2673-253X

[thumbnail of pubmed-zip/versions/2/package-entries/fdgth-04-806076-r1/fdgth-04-806076.pdf] Text
pubmed-zip/versions/2/package-entries/fdgth-04-806076-r1/fdgth-04-806076.pdf - Published Version

Download (779kB)

Abstract

Objective: Automated speech recognition (ASR) systems have become increasingly sophisticated, accurate, and deployable on many digital devices, including on a smartphone. This pilot study aims to examine the speech recognition performance of ASR apps using audiological speech tests. In addition, we compare ASR speech recognition performance to normal hearing and hearing impaired listeners and evaluate if standard clinical audiological tests are a meaningful and quick measure of the performance of ASR apps.

Methods: Four apps have been tested on a smartphone, respectively AVA, Earfy, Live Transcribe, and Speechy. The Dutch audiological speech tests performed were speech audiometry in quiet (Dutch CNC-test), Digits-in-Noise (DIN)-test with steady-state speech-shaped noise, sentences in quiet and in averaged long-term speech-shaped spectrum noise (Plomp-test). For comparison, the app's ability to transcribe a spoken dialogue (Dutch and English) was tested.

Results: All apps scored at least 50% phonemes correct on the Dutch CNC-test for a conversational speech intensity level (65 dB SPL) and achieved 90–100% phoneme recognition at higher intensity levels. On the DIN-test, AVA and Live Transcribe had the lowest (best) signal-to-noise ratio +8 dB. The lowest signal-to-noise measured with the Plomp-test was +8 to 9 dB for Earfy (Android) and Live Transcribe (Android). Overall, the word error rate for the dialogue in English (19–34%) was lower (better) than for the Dutch dialogue (25–66%).

Conclusion: The performance of the apps was limited on audiological tests that provide little linguistic context or use low signal to noise levels. For Dutch audiological speech tests in quiet, ASR apps performed similarly to a person with a moderate hearing loss. In noise, the ASR apps performed more poorly than most profoundly deaf people using a hearing aid or cochlear implant. Adding new performance metrics including the semantic difference as a function of SNR and reverberation time could help to monitor and further improve ASR performance.

Item Type: Article
Subjects: Opene Prints > Multidisciplinary
Depositing User: Managing Editor
Date Deposited: 03 Feb 2023 07:36
Last Modified: 30 Jul 2024 05:57
URI: http://geographical.go2journals.com/id/eprint/1270

Actions (login required)

View Item
View Item