End-to-end audiovisual speech recognition based on attention fusion of SDBN and BLSTM

An end-to-end audiovisual speech recognition algorithm was proposed.In algorithm,a Ski de fond - Junior - Fixations - Classic sparse DBN was constructed by introducing mixed l<sub>1/2</sub>norm and l<sub>1</sub>norm into Deep Belief Network with bottleneck structure to extract the sparse bottleneck features,so as to reduce the dimension of data features,and then a BLSTM was used to model the feature in time series.Then,a attention mechanism was used to align and fuse the lip visual information and audio auditory information automatically.Finally,the fused audiovisual information was classified and identified by a BLSTM with a Softmax Mini Funnel layer attached.

Experiments show that the algorithm can effectively identify visual and auditory information,and has good recognition rate and robustness in similar algorithms.

End-to-end audiovisual speech recognition based on attention fusion of SDBN and BLSTM

End-to-end audiovisual speech recognition based on attention fusion of SDBN and BLSTM

Leave a Reply Cancel reply

Links

Visitors

Archives

Categories

Meta