Audiovisual Speech Perception with Degraded Auditory Cues PDF Download

Are you looking for read ebook online? Search for your book and save it on your Kindle device, PC, phones or tablets. Download Audiovisual Speech Perception with Degraded Auditory Cues PDF full book. Access full book title Audiovisual Speech Perception with Degraded Auditory Cues by Elizabeth Anderson. Download full books in PDF and EPUB format.

Audiovisual Speech Perception with Degraded Auditory Cues

Author: Elizabeth Anderson
Publisher:
ISBN:
Category :
Languages : en
Pages :

Book Description
Abstract: Speech perception, although generally assumed to be a primarily auditory process, also depends on visual cues. Audio and visual signals are not only used together when signals are compromised, such as in a noisy environment, but also when the signals are completely intelligible. McGurk and MacDonald (1976) demonstrated the integration of these cues in a paradigm known today as the McGurk effect. One possible underlying explanation for the McGurk effect is the substantial redundancy in the auditory speech signal. An unanswered question concerns the circumstances that promote optimal perception of auditory and visual signals; is integration improved when one or both signals contain some ambiguity, or is a certain degree of redundancy necessary for integration to occur? If so, how much redundancy is necessary for optimal integration? The present study began to examine the amount of redundancy necessary for optimal auditory + visual integration. Audio portions of speech recordings were degraded using a software program that reduced the speech signals to four spectral bands, effectively reducing the redundancy of the auditory signal. Performance of participants under four conditions; 1) degraded auditory only, 2) visual only, 3) degraded auditory + visual, and 4) non-degraded auditory + visual, was explored to assess the degree of integration when the redundancy of the auditory signal is reduced. Integration was determined by; 1) comparing the percent of integration across degraded and non-degraded auditory + visual conditions to degraded-auditory only and visual only conditions, and 2) recording the percent of degraded auditory + visual McGurk responses. Results indicate that reducing the redundancy of the auditory signal has no significant effect on auditory + visual integration, suggesting that the amount of redundancy in the auditory signal does not influence the degree of multimodal integration.

Audiovisual Speech Perception with Degraded Auditory Cues

Author: Elizabeth Anderson
Publisher:
ISBN:
Category :
Languages : en
Pages :

Audiovisual Integration of Reduced Information Speech Stimuli

Author: Meghan Hiss
Publisher:
ISBN:
Category :
Languages : en
Pages : 32

Book Description
Abstract: Every day, without knowing it, we are using more than one sense to perceive speech. Speech perception is a combined effort using not only auditory cues, but visual cues as well. This has been observed in situations where one of the cues is impaired, leading to a reliance on the other cue to fill in the missing pieces. An example of this would be a noisy environment where the auditory cue is difficult to interpret, and as a result, the individual will start to depend on his or her ability to interpret the visual cue. It has been found, however, that even when the auditory signal remains intact, individuals will still use their visual cues and fuse the two responses together. This is shown in the McGurk Effect, in which listeners were presented with an auditory stimulus of "ba" and a visual stimulus of "ga," with the result that most listeners perceived "da," a fusion of the two places of articulation. Numerous additional studies have investigated the integration of auditory and visual cues in more detail. In general, three different aspects of the process have been identified as important determinants of audiovisual integration. Those aspects include talker characteristics, listener characteristics, and the effect of degrading the auditory stimulus. Previous studies in our lab have demonstrated the effects of degrading the auditory stimulus by reducing its spectral fine structure. Even with as few as four spectral channels of information, subjects have found these stimuli highly intelligible. However, another means of reducing spectral information in speech, a reduction to a series of three sine waves that follow the general formant structure of the stimulus, was found by our subjects to be far less intelligible. Because these previous studies employed different groups of subjects, it is possible that observed differences in performance could be attributable to aspects other than the reduced waveforms themselves. The present study addressed this question by performing a within-subjects comparison of intelligibility for these two types of auditory stimuli. In addition, we evaluated the potential priming effects that the order of the stimulus presentation had on performance for these two types of stimuli. 6 talkers and 12 listeners participated in this study. The 12 listeners were separated into three different groups, four participants to a group. The type of auditory stimulus and the order in which it was presented varied across groups. The two different stimuli used in this study were 2-filter degraded speech and sine wave speech. The stimuli were 8 CVC syllables, all of which had the same medial vowel and differed in only the initial consonant. The first group was presented the stimuli in an alternating order, i.e., the listeners listened to 2-filter degraded speech of a talker and then listened to sine wave speech of the same talker. The second group listened to all of the sine wave stimuli first and then listened to all the 2-filter degraded stimuli. The third group listened to all of the 2-filter degraded stimuli first and then listened to all of the sine wave stimuli. Each participant was tested under auditory-only presentation, followed by auditory plus visual presentation for each stimulus type. Results demonstrated that participants performed far better with 2-filter speech than sine wave speech. However, the order in which the stimulus was presented did not have a significant impact on the performance of the participants. Interestingly, subjects showed more audiovisual integration for sine wave speech than for the 2-filter speech, suggesting that a more highly degraded auditory stimulus promotes greater integration.

Auditory and Visual Information Facilitating Speech Integration

Author: Brandie Andrews
Publisher:
ISBN:
Category :
Languages : en
Pages :

Book Description
Abstract: Speech perception is often thought to be a unimodal process (using one sense) when, in fact, it is a multimodal process that uses both auditory and visual inputs. In certain situations where the auditory signal has become compromised, the addition of visual cues can greatly improve a listener's ability to perceive speech (e.g., in a noisy environment or because of a hearing loss). Interestingly, there is evidence that visual cues are used even when the auditory signal is completely intelligible, as demonstrated in the McGurk Effect, in which simultaneous presentation of an auditory syllable "ba" with a visual syllable "ga" results in the perception of the sound "da," a fusion of the two inputs. Audiovisual speech perception ability varies widely across listeners; individuals integrate different amounts of auditory and visual information to understand speech. It is suggested that characteristics of the listener, characteristics of the auditory and visual inputs, and characteristics of the talker may all play a role in the variability of audiovisual integration. The present study explored the possibility that differences in talker characteristics (unique acoustic and visual characteristics of articulation) might be responsible for some of the variability in a listener's ability to perceive audiovisual speech. Ten listeners were presented with degraded auditory, visual, and audiovisual speech syllable stimuli produced by fourteen talkers. Results indicated substantial differences in intelligibility across talkers under the auditory-only condition, but little variability in visual-only intelligibility. In addition, talkers produced widely varying amounts of audiovisual integration, but interestingly, the talkers producing the most audiovisual integration were not those with the highest auditory-only intelligibility.

How Does Feedback Impact Training in Audio-visual Speech Perception?

Author: Amy Ranta
Publisher:
ISBN:
Category :
Languages : en
Pages : 29

Book Description
Abstract: Integration of visual and auditory speech cues is a process used by listeners in compromised listening situations, as well as in normal environments, as exemplified by the McGurk effect (McGurk and McDonald, 1976). Audio-visual integration of speech appears to be a skill independent of the ability to process auditory or visual speech cues alone. Grant and Seitz (1998) argued for independence of this process based on their findings that integration abilities could not be predicted from auditory-only or visual-only performance. Gariety (2009) and James (2009) further supported this argument by training listeners in the auditory-only modality with degraded speech syllables, then testing those listeners in the auditory-only, visual-only, and audio-visual conditions. Their results showed an increase in auditory-only performance, but no improvement in integration. Recently, DiStefano (2010) conducted a training study in which listeners were trained in the audio-visual modality with degraded speech syllables. Results showed that the performance increased only for the audio-visual conditions, and did not increase in the auditory-only or visual-only conditions. Interestingly, performance improved only for stimulus pairs that were "congruent" (i.e. the auditory and visual inputs were the same syllable) and did not increase for "discrepant" stimuli (i.e. the auditory and visual inputs were different syllables). It is possible that the feedback provided in DiStefano's study impacted their pattern of results. However, the question remains as to whether integration of discrepant stimuli can be trained. In the present study, five listeners received ten hours of training sessions in the audio-visual condition with degraded speech signals similar to those used by Shannon et al. (1995). The feedback given during training was designed to encourage McGurk-type combination and fusion responses, in contrast to DiStefano's study, in which feedback was given to encourage responses that matched the auditory signal. A comparison of pre-training and post-training scores showed little to no improvement in auditory-only performance, and a slight decrease in visual-only performance for congruent stimuli. Further, a substantial increase in McGurk-type responses was seen from pre- to post-test for discrepant stimuli. These results provide further support that integration is an independent process, and that the feedback provided strongly influences response patterns. This strong lack of generalization from training also should be incorporated into designing effective integration training programs for aural rehabilitation.

Speech Recognition in Adverse Conditions

Author: Sven Mattys
Publisher: Psychology Press
ISBN: 1317836812
Category : Psychology
Languages : en
Pages : 326

Book Description
Speech recognition in ‘adverse conditions’ has been a familiar area of research in computer science, engineering, and hearing sciences for several decades. In contrast, most psycholinguistic theories of speech recognition are built upon evidence gathered from tasks performed by healthy listeners on carefully recorded speech, in a quiet environment, and under conditions of undivided attention. Building upon the momentum initiated by the Psycholinguistic Approaches to Speech Recognition in Adverse Conditions workshop held in Bristol, UK, in 2010, the aim of this volume is to promote a multi-disciplinary, yet unified approach to the perceptual, cognitive, and neuro-physiological mechanisms underpinning the recognition of degraded speech, variable speech, speech experienced under cognitive load, and speech experienced by theoretically relevant populations. This collection opens with a review of the literature and a formal classification of adverse conditions. The research articles then highlight those adverse conditions with the greatest potential for constraining theory, showing that some speech phenomena often believed to be immutable can be affected by noise, surface variations, or attentional set in ways that will force researchers to rethink their theory. This volume is essential for those interested in speech recognition outside laboratory constraints.

Analysis of Talker Characteristics in Audio-visual Speech Integration

Author: Kelly Dietrich
Publisher:
ISBN:
Category :
Languages : en
Pages : 68

Book Description
Abstract: Speech perception is commonly thought of as an auditory process, but in actuality it is a multimodal process that integrates both auditory and visual information. In certain situations where auditory information has been compromised, such as due to a hearing impairment or a noisy environment, visual cues help listeners to fill in missing pieces of auditory information during communication. Interestingly, even when both auditory and visual cues are entirely comprehensible alone, both are taken into account during speech perception. McGurk and MacDonald (1976) demonstrated that listeners not only benefit from the addition of visual cues during speech perception in situations where there is a lack of auditory information, but also that speech perception naturally employs audio-visual integration when both cues are available. Although a growing body of research has demonstrated that listeners integrate auditory and visual information during speech perception, there is a significant degree of variability seen in the audio-visual integration and benefit of listeners. Grant and Seitz (1998) demonstrated that the variability in audio-visual speech integration is, in part, a result of individual listener differences in multimodal integration ability. We suggest that individual characteristics of both the auditory signal and talker might also influence the audio-visual speech integration process (Andrews, 2007; Hungerford, 2007; Huffman, 2007). Research from our lab has demonstrated a significant amount of variability in the performance of listeners on tasks of degraded auditory-only and audio-visual speech perception. Furthermore, these studies have revealed a significant amount of variability across different talkers in the degree of integration they elicit. The amount of information in the auditory signal clearly has an effect on audio-visual integration. However, in order to fully understand how different talkers and the varying information in the auditory signal impact audio-visual performance, an analysis of the speech waveform must be performed to directly compare acoustic characteristics with subject performance. The present study conducted a spectrographic analysis of the speech syllables of different talkers used in a previous perception study to evaluate individual acoustic characteristics. Based on behavioral confusion matrices that were made we were able to easily examine possible confusions demonstrated by listeners. Some of the behavioral confusions were easily explained by examining syllable formant tracks, while others were explained by the possibility that noise introduced into the waveform when the stimuli were degraded obscured subtle differences in the voice onset time of some confused syllables. Still other confusions were not easily explained by the analysis completed in the present study. The results of the present study provide the foundation for understanding aspects of the acoustic waveform and talker qualities that are desirable for optimal audio-visual speech integration and might also have implications for the design of future aural rehabilitation programs.

Training Effects in Audio-visual Integration of Sine Wave Speech

Author: Megan Exner
Publisher:
ISBN:
Category :
Languages : en
Pages : 28

Book Description
Abstract: Speech perception is a bimodal process that involves both auditory and visual inputs. The auditory signal typically provides enough information for speech perception; however, when the auditory signal is compromised, such as when listening in a noisy environment or due to a hearing loss, people rely on visual cues to aid in understanding speech. Visual cues have been shown to significantly improve speech perception when the auditory signal is degraded in some way. The McGurk and MacDonald study (1976) strongly supported the fact that speech is not a purely auditory process and that there is a visual influence even with perfect auditory input. There is a growing interest in the benefit that listeners receive from audio-visual integration when the auditory signal is compromised. Remez et al, (1981) studied intelligibility when the speech waveform is reduced to three sine waves that represent the first three formants of the original signal. Remez discovered that sine wave speech is still highly intelligible even though a considerable amount of information was removed from the speech signal. Grant and Seitz (1998) looked at audio-visual integration performance of hearing impaired listeners by comparing a variety of audio-visual integration tasks using nonsense syllables and sentences. The study's results showed that even when the auditory signal is poor, speech perception is highly improved with the aid of visual cues. However, a large degree of variability was seen in the benefit that listeners receive from audio-visual integration. Further analysis suggested that at least some of this variability can be attributed to individual differences in listeners' abilities to integrate auditory and visual speech information. Studies in our lab have explored the differences in benefit that listeners receive from visual cues during audio-visual integration. We propose that one source of the variability in the benefit that listeners receive may be the overall amount of information available in the auditory signal. A previous study in our laboratory, Tamosiunas (2007) explored the audio-visual benefit that listeners received using highly-degraded sine wave speech. Results of this study indicated that listeners received little benefit from the addition of visual cues and in some cases these cues actually inhibited speech perception. A possible explanation for the difficulties in speech perception found in this study was the degree of exposure subjects had to sine wave speech. The present study explored whether the lack of audio-visual integration and benefit seen in Tamosiunas' (2007) study was a result of unfamiliarity with sine wave speech or whether this degree of auditory signal degrading inhibits audio-visual integration. To accomplish this, listeners in the present study were provided auditory and audio-visual training in sine wave speech perception. Results show that with training and exposure, speech perception performance did increase in both auditory and audio-visual conditions.

Neural Correlates of Unimodal and Multimodal Speech Perception in Cochlear Implant Users and Normal-hearing Listeners

Author: Hannah E. Shatzer
Publisher:
ISBN:
Category : Cognitive psychology
Languages : en
Pages :

Book Description
Spoken word recognition often involves the integration of both auditory and visual speech cues. The addition of visual cues is particularly useful for individuals with hearing loss and cochlear implants (CIs), as the auditory signal they perceive is degraded compared to individuals with normal hearing (NH). CI users generally benefit more from visual cues than NH perceivers; however, the underlying neural mechanisms affording them this benefit are not well-understood. The current study sought to identify the neural mechanisms active during auditory-only and audiovisual speech processing in CI users and determine how they differ from NH perceivers. Postlingually deaf experienced CI users and age-matched NH adults completed syllable and word recognition tasks during EEG recording, and the neural data was analyzed for differences in event-related potentials and neural oscillations. The results showed that during phonemic processing in the syllable task, CI users have stronger AV integration, shifting processing away from primary auditory cortex and weighting the visual signal more strongly. During whole-word processing in the word task, early acoustic processing is preserved and similar to NH perceivers, but again displaying robust AV integration. Lipreading ability also predicted suppression of early auditory processing across both CI and NH participants, suggesting that while some neural reorganization may have occurred in CI recipients to improve multisensory integrative processing, visual speech ability leads to reduced sensory processing in primary auditory cortex regardless of hearing status. Findings further support behavioral evidence for strong AV integration in CI users and the critical role of vision in improving speech perception.

The Role of Information Redundancy in Audiovisual Speech Integration

Author: Michelle Hungerford
Publisher:
ISBN:
Category :
Languages : en
Pages : 44

Book Description
Abstract: When most people think about communication, they think of its auditory aspect. Communication is made up of much more; speech perception is dependent on the integration of different senses, namely the auditory and visual systems. An everyday example of this is when someone tries to have a conversation at a noisy restaurant; a person may unconsciously pay attention to the speaker's facial movements in order to gain some visual information in an imperfect auditory situation. In general, listeners are able to use visual cues in impoverished auditory situations (like a restaurant, or a hearing loss.) However, this process also occurs when the auditory signal provides sufficient information alone. In 1998, Grant and Seitz conducted a study that found that listeners differ greatly in their perceptions of auditory-visual speech. This study generated a lot of questions about how integration occurs, namely what promotes "optimal integration." Research shows that many factors may be involved: it may be characteristics of the listener, the talker, or of the acoustic signal that influence the amount of integration. The present study looked at the characteristics of the acoustics, namely whether removal of fine spectral information from the speech signal would elicit more use of visual cues, and thus elicit greater audiovisual integration. The auditory stimuli were degraded by removing the spectral fine structure and replacing it with noise, but retaining the envelope structure. These stimuli were then output through 2-,4-,6-, and 8-channel bandpass filters. Ten listeners with normal hearing were tested under auditory-only, visual-only, and auditory-plus-visual presentations. Results showed substantial auditory-visual integration over all conditions. Also, significant cross-talker effects were found in the 2- and 4-channel auditory-only condition. However, the degree of integration produced by the talkers was not related to auditory intelligibility. The results of this study have implications for our understanding of the auditory-visual integration process.

Neural Mechanisms of Perceptual Categorization as Precursors to Speech Perception

Author: Einat Liebenthal
Publisher: Frontiers Media SA
ISBN: 2889451585
Category :
Languages : en
Pages : 188

Book Description
Perceptual categorization is fundamental to the brain’s remarkable ability to process large amounts of sensory information and efficiently recognize objects including speech. Perceptual categorization is the neural bridge between lower-level sensory and higher-level language processing. A long line of research on the physical properties of the speech signal as determined by the anatomy and physiology of the speech production apparatus has led to descriptions of the acoustic information that is used in speech recognition (e.g., stop consonants place and manner of articulation, voice onset time, aspiration). Recent research has also considered what visual cues are relevant to visual speech recognition (i.e., the visual counter-parts used in lipreading or audiovisual speech perception). Much of the theoretical work on speech perception was done in the twentieth century without the benefit of neuroimaging technologies and models of neural representation. Recent progress in understanding the functional organization of sensory and association cortices based on advances in neuroimaging presents the possibility of achieving a comprehensive and far reaching account of perception in the service of language. At the level of cell assemblies, research in animals and humans suggests that neurons in the temporal cortex are important for encoding biological categories. On the cellular level, different classes of neurons (interneurons and pyramidal neurons) have been suggested to play differential roles in the neural computations underlying auditory and visual categorization. The moment is ripe for a research topic focused on neural mechanisms mediating the emergence of speech representations (including auditory, visual and even somatosensory based forms). Important progress can be achieved by juxtaposing within the same research topic the knowledge that currently exists, the identified lacunae, and the theories that can support future investigations. This research topic provides a snapshot and platform for discussion of current understanding of neural mechanisms underlying the formation of perceptual categories and their relationship to language from a multidisciplinary and multisensory perspective. It includes contributions (reviews, original research, methodological developments) pertaining to the neural substrates, dynamics, and mechanisms underlying perceptual categorization and their interaction with neural processes governing speech perception.