Plain-to-clear speech video conversion for enhanced intelligibility

Sachdeva, Shubam; Ruan, Haoyao; Hamarneh, Ghassan; Behne, Dawn M.; Jongman, Allard; Sereno, Joan A.; Wang, Yue

dc.contributor.author	Sachdeva, Shubam
dc.contributor.author	Ruan, Haoyao
dc.contributor.author	Hamarneh, Ghassan
dc.contributor.author	Behne, Dawn M.
dc.contributor.author	Jongman, Allard
dc.contributor.author	Sereno, Joan A.
dc.contributor.author	Wang, Yue
dc.date.accessioned	2023-04-10T16:50:43Z
dc.date.available	2023-04-10T16:50:43Z
dc.date.issued	2023-01-28
dc.identifier.citation	Sachdeva, S., Ruan, H., Hamarneh, G. et al. Plain-to-clear speech video conversion for enhanced intelligibility. Int J Speech Technol 26, 163–184 (2023). https://doi.org/10.1007/s10772-023-10018-z	en_US
dc.identifier.uri	https://hdl.handle.net/1808/34078
dc.description.abstract	Clearly articulated speech, relative to plain-style speech, has been shown to improve intelligibility. We examine if visible speech cues in video only can be systematically modified to enhance clear-speech visual features and improve intelligibility. We extract clear-speech visual features of English words varying in vowels produced by multiple male and female talkers. Via a frame-by-frame image-warping based video generation method with a controllable parameter (displacement factor), we apply the extracted clear-speech visual features to videos of plain speech to synthesize clear speech videos. We evaluate the generated videos using a robust, state of the art AI Lip Reader as well as human intelligibility testing. The contributions of this study are: (1) we successfully extract relevant visual cues for video modifications across speech styles, and have achieved enhanced intelligibility for AI; (2) this work suggests that universal talker-independent clear-speech features may be utilized to modify any talker’s visual speech style; (3) we introduce “displacement factor” as a way of systematically scaling the magnitude of displacement modifications between speech styles; and (4) the high definition generated videos make them ideal candidates for human-centric intelligibility and perceptual training studies.	en_US
dc.publisher	Springer	en_US
dc.rights	© The Author(s) 2023. This article is licensed under a Creative Commons Attribution 4.0 International License.	en_US
dc.rights.uri	http://creativecommons.org/licenses/by/4.0/	en_US
dc.subject	Video speech synthesis	en_US
dc.subject	Speech style	en_US
dc.subject	Intelligibility	en_US
dc.subject	AI lip reading	en_US
dc.subject	Speech enhancement	en_US
dc.title	Plain-to-clear speech video conversion for enhanced intelligibility	en_US
dc.type	Article	en_US
kusw.kuauthor	Jongman, Allard
kusw.kuauthor	Sereno, Joan A.
kusw.kudepartment	Linguistics	en_US
dc.identifier.doi	10.1007/s10772-023-10018-z	en_US
dc.identifier.orcid	https://orcid.org/0000-0003-3862-3767	en_US
kusw.oaversion	Scholarly/refereed, publisher version	en_US
kusw.oapolicy	This item meets KU Open Access policy criteria.	en_US
dc.identifier.pmid	PMC10042924	en_US
dc.rights.accessrights	openAccess	en_US

Files in this item

Name:: Sachdeva_2023.pdf
Size:: 2.173Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

The University of Kansas prohibits discrimination on the basis of race, color, ethnicity, religion, sex, national origin, age, ancestry, disability, status as a veteran, sexual orientation, marital status, parental status, gender identity, gender expression and genetic information in the University’s programs and activities. The following person has been designated to handle inquiries regarding the non-discrimination policies: Director of the Office of Institutional Opportunity and Access, IOA@ku.edu, 1246 W. Campus Road, Room 153A, Lawrence, KS, 66045, (785)864-6414, 711 TTY.