WEKO3
AND
アイテム
{"_buckets": {"deposit": "88b3f446-9fd5-44cd-b4e8-2ac345312d2d"}, "_deposit": {"id": "13072", "owners": [], "pid": {"revision_id": 0, "type": "depid", "value": "13072"}, "status": "published"}, "_oai": {"id": "oai:nagoya.repo.nii.ac.jp:00013072"}, "item_10_biblio_info_6": {"attribute_name": "\u66f8\u8a8c\u60c5\u5831", "attribute_value_mlt": [{"bibliographicIssueDates": {"bibliographicIssueDate": "2008-03-01", "bibliographicIssueDateType": "Issued"}, "bibliographicIssueNumber": "3", "bibliographicPageEnd": "466", "bibliographicPageStart": "457", "bibliographicVolumeNumber": "E91-D", "bibliographic_titles": [{"bibliographic_title": "IEICE transactions on information and systems"}]}]}, "item_10_description_4": {"attribute_name": "\u6284\u9332", "attribute_value_mlt": [{"subitem_description": "In a distant-talking environment, the length of channel impulse response is longer than the short-term spectral analysis window. Conventional short-term spectrum based Cepstral Mean Normalization (CMN) is therefore, not effective under these conditions. In this paper, we propose a robust speech recognition method by combining a short-term spectrum based CMN with a long-term one. We assume that a static speech segment (such as a vowel, for example) affected by reverberation, can be modeled by a long-term cepstral analysis. Thus, the effect of long reverberation on a static speech segment may be compensated by the long-term spectrum based CMN. The cepstral distance of neighboring frames is used to discriminate the static speech segment (long-term spectrum) and the non-static speech segment (short-term spectrum). The cepstra of the static and non-static speech segments are normalized by the corresponding cepstral means. In a previous study, we proposed an environmentally robust speech recognition method based on Position-Dependent CMN (PDCMN) to compensate for channel distortion depending on speaker position, and which is more efficient than conventional CMN. In this paper, the concept of combining short-term and long-term spectrum based CMN is extended to PDCMN. We call this Variable Term spectrum based PDCMN (VT-PDCMN). Since PDCMN/VT-PDCMN cannot normalize speaker variations because a position-dependent cepstral mean contains the average speaker characteristics over all speakers, we also combine PDCMN/VT-PDCMN with conventional CMN in this study. We conducted the experiments based on our proposed method using limited vocabulary (100 words) distant-talking isolated word recognition in a real environment. The proposed method achieved a relative error reduction rate of 60.9% over the conventional short-term spectrum based CMN and 30.6% over the short-term spectrum based PDCMN.", "subitem_description_type": "Abstract"}]}, "item_10_identifier_60": {"attribute_name": "URI", "attribute_value_mlt": [{"subitem_identifier_type": "URI", "subitem_identifier_uri": "http://www.ieice.org/jpn/trans_online/index.html"}, {"subitem_identifier_type": "HDL", "subitem_identifier_uri": "http://hdl.handle.net/2237/14966"}]}, "item_10_publisher_32": {"attribute_name": "\u51fa\u7248\u8005", "attribute_value_mlt": [{"subitem_publisher": "Institute of Electronics, Information and Communication Engineers"}]}, "item_10_rights_12": {"attribute_name": "\u6a29\u5229", "attribute_value_mlt": [{"subitem_rights": "Copyright (C) 2008 IEICE"}]}, "item_10_select_15": {"attribute_name": "\u8457\u8005\u7248\u30d5\u30e9\u30b0", "attribute_value_mlt": [{"subitem_select_item": "publisher"}]}, "item_10_source_id_7": {"attribute_name": "ISSN", "attribute_value_mlt": [{"subitem_source_identifier": "0916-8532", "subitem_source_identifier_type": "ISSN"}]}, "item_creator": {"attribute_name": "\u8457\u8005", "attribute_type": "creator", "attribute_value_mlt": [{"creatorNames": [{"creatorName": "WANG, Longbiao"}], "nameIdentifiers": [{"nameIdentifier": "41135", "nameIdentifierScheme": "WEKO"}]}, {"creatorNames": [{"creatorName": "NAKAGAWA, Seiichi"}], "nameIdentifiers": [{"nameIdentifier": "41136", "nameIdentifierScheme": "WEKO"}]}, {"creatorNames": [{"creatorName": "KITAOKA, Norihide"}], "nameIdentifiers": [{"nameIdentifier": "41137", "nameIdentifierScheme": "WEKO"}]}]}, "item_files": {"attribute_name": "\u30d5\u30a1\u30a4\u30eb\u60c5\u5831", "attribute_type": "file", "attribute_value_mlt": [{"accessrole": "open_date", "date": [{"dateType": "Available", "dateValue": "2018-02-20"}], "displaytype": "detail", "download_preview_message": "", "file_order": 0, "filename": "393.pdf", "filesize": [{"value": "350.5 kB"}], "format": "application/pdf", "future_date_message": "", "is_thumbnail": false, "licensetype": "license_free", "mimetype": "application/pdf", "size": 350500.0, "url": {"label": "393.pdf", "url": "https://nagoya.repo.nii.ac.jp/record/13072/files/393.pdf"}, "version_id": "05c0c528-97cf-498f-82d7-8f52f772addc"}]}, "item_keyword": {"attribute_name": "\u30ad\u30fc\u30ef\u30fc\u30c9", "attribute_value_mlt": [{"subitem_subject": "robust speech recognition", "subitem_subject_scheme": "Other"}, {"subitem_subject": "distant-talking environment", "subitem_subject_scheme": "Other"}, {"subitem_subject": "CMN", "subitem_subject_scheme": "Other"}, {"subitem_subject": "long-term spectrum", "subitem_subject_scheme": "Other"}]}, "item_language": {"attribute_name": "\u8a00\u8a9e", "attribute_value_mlt": [{"subitem_language": "eng"}]}, "item_resource_type": {"attribute_name": "\u8cc7\u6e90\u30bf\u30a4\u30d7", "attribute_value_mlt": [{"resourcetype": "journal article", "resourceuri": "http://purl.org/coar/resource_type/c_6501"}]}, "item_title": "Robust Speech Recognition by Combining Short-Term and Long-Term Spectrum Based Position-Dependent CMN with Conventional CMN", "item_titles": {"attribute_name": "\u30bf\u30a4\u30c8\u30eb", "attribute_value_mlt": [{"subitem_title": "Robust Speech Recognition by Combining Short-Term and Long-Term Spectrum Based Position-Dependent CMN with Conventional CMN"}]}, "item_type_id": "10", "owner": "1", "path": ["312/313/314"], "permalink_uri": "http://hdl.handle.net/2237/14966", "pubdate": {"attribute_name": "\u516c\u958b\u65e5", "attribute_value": "2011-06-28"}, "publish_date": "2011-06-28", "publish_status": "0", "recid": "13072", "relation": {}, "relation_version_is_last": true, "title": ["Robust Speech Recognition by Combining Short-Term and Long-Term Spectrum Based Position-Dependent CMN with Conventional CMN"], "weko_shared_id": null}
Robust Speech Recognition by Combining Short-Term and Long-Term Spectrum Based Position-Dependent CMN with Conventional CMN
http://hdl.handle.net/2237/14966
a564939d-9fcc-4a06-a4c9-215ad95fb026
名前 / ファイル | ライセンス | アクション | |
---|---|---|---|
![]() |
|
Item type | 学術雑誌論文 / Journal Article(1) | |||||
---|---|---|---|---|---|---|
公開日 | 2011-06-28 | |||||
タイトル | ||||||
タイトル | Robust Speech Recognition by Combining Short-Term and Long-Term Spectrum Based Position-Dependent CMN with Conventional CMN | |||||
著者 |
WANG, Longbiao
× WANG, Longbiao× NAKAGAWA, Seiichi× KITAOKA, Norihide |
|||||
権利 | ||||||
権利情報 | Copyright (C) 2008 IEICE | |||||
キーワード | ||||||
主題Scheme | Other | |||||
主題 | robust speech recognition | |||||
キーワード | ||||||
主題Scheme | Other | |||||
主題 | distant-talking environment | |||||
キーワード | ||||||
主題Scheme | Other | |||||
主題 | CMN | |||||
キーワード | ||||||
主題Scheme | Other | |||||
主題 | long-term spectrum | |||||
抄録 | ||||||
内容記述 | In a distant-talking environment, the length of channel impulse response is longer than the short-term spectral analysis window. Conventional short-term spectrum based Cepstral Mean Normalization (CMN) is therefore, not effective under these conditions. In this paper, we propose a robust speech recognition method by combining a short-term spectrum based CMN with a long-term one. We assume that a static speech segment (such as a vowel, for example) affected by reverberation, can be modeled by a long-term cepstral analysis. Thus, the effect of long reverberation on a static speech segment may be compensated by the long-term spectrum based CMN. The cepstral distance of neighboring frames is used to discriminate the static speech segment (long-term spectrum) and the non-static speech segment (short-term spectrum). The cepstra of the static and non-static speech segments are normalized by the corresponding cepstral means. In a previous study, we proposed an environmentally robust speech recognition method based on Position-Dependent CMN (PDCMN) to compensate for channel distortion depending on speaker position, and which is more efficient than conventional CMN. In this paper, the concept of combining short-term and long-term spectrum based CMN is extended to PDCMN. We call this Variable Term spectrum based PDCMN (VT-PDCMN). Since PDCMN/VT-PDCMN cannot normalize speaker variations because a position-dependent cepstral mean contains the average speaker characteristics over all speakers, we also combine PDCMN/VT-PDCMN with conventional CMN in this study. We conducted the experiments based on our proposed method using limited vocabulary (100 words) distant-talking isolated word recognition in a real environment. The proposed method achieved a relative error reduction rate of 60.9% over the conventional short-term spectrum based CMN and 30.6% over the short-term spectrum based PDCMN. | |||||
内容記述タイプ | Abstract | |||||
出版者 | ||||||
出版者 | Institute of Electronics, Information and Communication Engineers | |||||
言語 | ||||||
言語 | eng | |||||
資源タイプ | ||||||
資源タイプresource | http://purl.org/coar/resource_type/c_6501 | |||||
タイプ | journal article | |||||
ISSN | ||||||
収録物識別子タイプ | ISSN | |||||
収録物識別子 | 0916-8532 | |||||
書誌情報 |
IEICE transactions on information and systems 巻 E91-D, 号 3, p. 457-466, 発行日 2008-03-01 |
|||||
著者版フラグ | ||||||
値 | publisher | |||||
URI | ||||||
識別子 | http://www.ieice.org/jpn/trans_online/index.html | |||||
識別子タイプ | URI | |||||
URI | ||||||
識別子 | http://hdl.handle.net/2237/14966 | |||||
識別子タイプ | HDL |