AbstractA recent result shows that inner speech can, with proper care, be decoded to the same high-level of accuracy as articulated speech. This relies, however, on neural data obtained while subjects perform elicited tasks, such as covert reading and repeating, whereas a neural speech prosthetic will require the decoding of inner speech that is self-generated. Prior work has, moreover, emphasised differences between these two kinds of inner speech, raising the question of how well a decoder optimised for one will generalise to the other. In this study, we trained phoneme-level decoders on an atypically large, elicited inner speech dataset, previously acquired using 7T fMRI in a single subject. We then acquired a second self-generated inner speech dataset in the same subject. Although the decoders were trained exclusively on neural recordings obtained during elicited inner speech, they predicted unseen phonemes accurately in both elicited and selfgenerated test conditions, illustrating the viability of zero-shot task transfer. This has significant practical importance for the development of a neural speech prosthetic, as labelled data is far easier to acquire at scale for elicited than for self-generated inner speech. Indeed, elicited tasks may be the only option for acquiring labelled data in critical patient populations who cannot control their vocal articulators.
Cold Spring Harbor Laboratory