ATTENTION: The software behind KU ScholarWorks is being upgraded to a new version. Starting July 15th, users will not be able to log in to the system, add items, nor make any changes until the new version is in place at the end of July. Searching for articles and opening files will continue to work while the system is being updated. If you have any questions, please contact Marianne Reed at mreed@ku.edu .

Show simple item record

dc.contributor.authorSachdeva, Shubam
dc.contributor.authorRuan, Haoyao
dc.contributor.authorHamarneh, Ghassan
dc.contributor.authorBehne, Dawn M.
dc.contributor.authorJongman, Allard
dc.contributor.authorSereno, Joan A.
dc.contributor.authorWang, Yue
dc.date.accessioned2023-04-10T16:50:43Z
dc.date.available2023-04-10T16:50:43Z
dc.date.issued2023-01-28
dc.identifier.citationSachdeva, S., Ruan, H., Hamarneh, G. et al. Plain-to-clear speech video conversion for enhanced intelligibility. Int J Speech Technol 26, 163–184 (2023). https://doi.org/10.1007/s10772-023-10018-zen_US
dc.identifier.urihttps://hdl.handle.net/1808/34078
dc.description.abstractClearly articulated speech, relative to plain-style speech, has been shown to improve intelligibility. We examine if visible speech cues in video only can be systematically modified to enhance clear-speech visual features and improve intelligibility. We extract clear-speech visual features of English words varying in vowels produced by multiple male and female talkers. Via a frame-by-frame image-warping based video generation method with a controllable parameter (displacement factor), we apply the extracted clear-speech visual features to videos of plain speech to synthesize clear speech videos. We evaluate the generated videos using a robust, state of the art AI Lip Reader as well as human intelligibility testing. The contributions of this study are: (1) we successfully extract relevant visual cues for video modifications across speech styles, and have achieved enhanced intelligibility for AI; (2) this work suggests that universal talker-independent clear-speech features may be utilized to modify any talker’s visual speech style; (3) we introduce “displacement factor” as a way of systematically scaling the magnitude of displacement modifications between speech styles; and (4) the high definition generated videos make them ideal candidates for human-centric intelligibility and perceptual training studies.en_US
dc.publisherSpringeren_US
dc.rights© The Author(s) 2023. This article is licensed under a Creative Commons Attribution 4.0 International License.en_US
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/en_US
dc.subjectVideo speech synthesisen_US
dc.subjectSpeech styleen_US
dc.subjectIntelligibilityen_US
dc.subjectAI lip readingen_US
dc.subjectSpeech enhancementen_US
dc.titlePlain-to-clear speech video conversion for enhanced intelligibilityen_US
dc.typeArticleen_US
kusw.kuauthorJongman, Allard
kusw.kuauthorSereno, Joan A.
kusw.kudepartmentLinguisticsen_US
dc.identifier.doi10.1007/s10772-023-10018-zen_US
dc.identifier.orcidhttps://orcid.org/0000-0003-3862-3767en_US
kusw.oaversionScholarly/refereed, publisher versionen_US
kusw.oapolicyThis item meets KU Open Access policy criteria.en_US
dc.identifier.pmidPMC10042924en_US
dc.rights.accessrightsopenAccessen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record

© The Author(s) 2023. This article is licensed under a Creative Commons Attribution 4.0 International License.
Except where otherwise noted, this item's license is described as: © The Author(s) 2023. This article is licensed under a Creative Commons Attribution 4.0 International License.