Robust Speech Recognition by Combining Short-Term and Long-Term Spectrum Based Position-Dependent CMN with Conventional CMN

WANG, Longbiao; NAKAGAWA, Seiichi; KITAOKA, Norihide

インデックスツリー

RootNode

アイテム

Robust Speech Recognition by Combining Short-Term and Long-Term Spectrum Based Position-Dependent CMN with Conventional CMN

http://hdl.handle.net/2237/14966

名前 / ファイル	ライセンス	アクション
393.pdf (350.5 kB)

Item type

学術雑誌論文 / Journal Article(1)

公開日

2011-06-28

タイトル

Robust Speech Recognition by Combining Short-Term and Long-Term Spectrum Based Position-Dependent CMN with Conventional CMN

言語

著者

WANG, Longbiao
NAKAGAWA, Seiichi
KITAOKA, Norihide

アクセス権

open access

アクセス権URI

http://purl.org/coar/access_right/c_abf2

権利

言語

権利情報

キーワード

主題Scheme

Other

主題

robust speech recognition

キーワード

主題Scheme

Other

主題

distant-talking environment

キーワード

主題Scheme

Other

主題

CMN

キーワード

主題Scheme

Other

主題

long-term spectrum

抄録

内容記述タイプ

Abstract

内容記述

In a distant-talking environment, the length of channel impulse response is longer than the short-term spectral analysis window. Conventional short-term spectrum based Cepstral Mean Normalization (CMN) is therefore, not effective under these conditions. In this paper, we propose a robust speech recognition method by combining a short-term spectrum based CMN with a long-term one. We assume that a static speech segment (such as a vowel, for example) affected by reverberation, can be modeled by a long-term cepstral analysis. Thus, the effect of long reverberation on a static speech segment may be compensated by the long-term spectrum based CMN. The cepstral distance of neighboring frames is used to discriminate the static speech segment (long-term spectrum) and the non-static speech segment (short-term spectrum). The cepstra of the static and non-static speech segments are normalized by the corresponding cepstral means. In a previous study, we proposed an environmentally robust speech recognition method based on Position-Dependent CMN (PDCMN) to compensate for channel distortion depending on speaker position, and which is more efficient than conventional CMN. In this paper, the concept of combining short-term and long-term spectrum based CMN is extended to PDCMN. We call this Variable Term spectrum based PDCMN (VT-PDCMN). Since PDCMN/VT-PDCMN cannot normalize speaker variations because a position-dependent cepstral mean contains the average speaker characteristics over all speakers, we also combine PDCMN/VT-PDCMN with conventional CMN in this study. We conducted the experiments based on our proposed method using limited vocabulary (100 words) distant-talking isolated word recognition in a real environment. The proposed method achieved a relative error reduction rate of 60.9% over the conventional short-term spectrum based CMN and 30.6% over the short-term spectrum based PDCMN.

言語

出版者

Institute of Electronics, Information and Communication Engineers

言語

eng

資源タイプ

資源タイプresource

http://purl.org/coar/resource_type/c_6501

タイプ

journal article

出版タイプ

VoR

出版タイプResource

http://purl.org/coar/version/c_970fb48d4fbd8a85

Versions

Ver.1

2021-03-01 18:39:53.216936

Show All versions

Cite as

WANG, Longbiao, NAKAGAWA, Seiichi, KITAOKA, Norihide, 2008, Robust Speech Recognition by Combining Short-Term and Long-Term Spectrum Based Position-Dependent CMN with Conventional CMN: Institute of Electronics, Information and Communication Engineers, 457–466 p.

エクスポート

OAI-PMH

JPCOAR 2.0
JPCOAR 1.0
DublinCore
DDI

Other Formats

JSON
BIBTEX

インデックスリンク

インデックスツリー

アイテム

Robust Speech Recognition by Combining Short-Term and Long-Term Spectrum Based Position-Dependent CMN with Conventional CMN

× WANG, Longbiao

× NAKAGAWA, Seiichi

× KITAOKA, Norihide

Versions

Share

Cite as

エクスポート