WEKO3
アイテム
{"_buckets": {"deposit": "2c1e77f3-8f78-48d7-8303-06ec23165007"}, "_deposit": {"id": "12410", "owners": [], "pid": {"revision_id": 0, "type": "depid", "value": "12410"}, "status": "published"}, "_oai": {"id": "oai:nagoya.repo.nii.ac.jp:00012410", "sets": ["734"]}, "author_link": ["39070", "39071"], "item_12_alternative_title_19": {"attribute_name": "その他のタイトル", "attribute_value_mlt": [{"subitem_alternative_title": "音声認識における音響特徴変換の最適化基準の一般化に関する研究", "subitem_alternative_title_language": "ja"}]}, "item_12_biblio_info_6": {"attribute_name": "書誌情報", "attribute_value_mlt": [{"bibliographicIssueDates": {"bibliographicIssueDate": "2010-09-30", "bibliographicIssueDateType": "Issued"}}]}, "item_12_date_granted_64": {"attribute_name": "学位授与年月日", "attribute_value_mlt": [{"subitem_dategranted": "2010-09-30"}]}, "item_12_degree_grantor_62": {"attribute_name": "学位授与機関", "attribute_value_mlt": [{"subitem_degreegrantor": [{"subitem_degreegrantor_language": "ja", "subitem_degreegrantor_name": "名古屋大学"}, {"subitem_degreegrantor_language": "en", "subitem_degreegrantor_name": "Nagoya University"}], "subitem_degreegrantor_identifier": [{"subitem_degreegrantor_identifier_name": "13901", "subitem_degreegrantor_identifier_scheme": "kakenhi"}]}]}, "item_12_degree_name_61": {"attribute_name": "学位名", "attribute_value_mlt": [{"subitem_degreename": "博士(情報科学)", "subitem_degreename_language": "ja"}]}, "item_12_description_4": {"attribute_name": "抄録", "attribute_value_mlt": [{"subitem_description": "This thesis deals with acoustic feature transformations in automatic speech recognition to improve basic performance of a speech recognizer. The aim of acoustic feature transformations is to reduce dimensionality of long-term speech features without losing discriminative information among the different phonetic classes.\u003cbr/\u003eFirst, we focus on optimizing acoustic feature transformations using criteria with which to maximize the ratio of between-class scatter to within-class scatter. This approach is based on a family of functions of scatter or covariance matrices, which is frequently used in practice. Typical methods in this approach include linear discriminant analysis (LDA), heteroscedastic linear discriminant analysis (HLDA), and heteroscedastic discriminant analysis (HDA). Although LDA, HLDA and HDA are the most widely used in speech recognition, the connections between them have been disregarded so far. By developing a unified mathematical framework, close relationships between them are identified and analyzed in detail. The framework termed power LDA (PLDA) can describe various criteria by varying its control parameter. PLDA includes LDA, HLDA and HDA as special cases. In order to determine a sub-optimal control parameter automatically, a control parameter selection method is also provided.\u003cbr/\u003eThe effectiveness of the combinations of acoustic feature transformations and discriminative training techniques of acoustic models is investigated and additional performance improvement is obtained. Unfortunately, the transformation methods mentioned above may result in an unexpected dimensionality reduction if the data in a certain class consist of several clusters, because they implicitly assume that data are generated from a single Gaussian distribution. This study provides extensions of HDA and PLDA to deal with class distributions with several clusters.\u003cbr/\u003eSecond, we focus attention on acoustic feature transformations which minimize a kind of classification error between different phonetic classes. As the performance of speech recognition systems generally correlates strongly with the classification accuracy of features, the features should have the power to discriminate between different classes. The existing methods for this approach attempt to minimize the average classification error between different classes. Although minimizing the average classification error suppresses total classification error, it cannot prevent the occurrence of considerable overlaps between distributions of some different classes with low frequencies, which is critical for speech recognition because there may be class pairs that have little or no discriminative information on each other. Instead of the average classification error, minimization methods of maximum classification error are proposed herewith so as to avoid considerable error between different classes. In addition, interpolation methods that minimize the maximization error while minimizing the average classification error are also proposed and achieved the best results.", "subitem_description_language": "en", "subitem_description_type": "Abstract"}]}, "item_12_description_5": {"attribute_name": "内容記述", "attribute_value_mlt": [{"subitem_description": "名古屋大学博士学位論文 学位の種類:博士(情報科学)(課程) 学位授与年月日:平成22年9月30日", "subitem_description_language": "ja", "subitem_description_type": "Other"}]}, "item_12_dissertation_number_65": {"attribute_name": "学位授与番号", "attribute_value_mlt": [{"subitem_dissertationnumber": "甲第8967号"}]}, "item_12_identifier_60": {"attribute_name": "URI", "attribute_value_mlt": [{"subitem_identifier_type": "HDL", "subitem_identifier_uri": "http://hdl.handle.net/2237/14293"}]}, "item_12_select_15": {"attribute_name": "著者版フラグ", "attribute_value_mlt": [{"subitem_select_item": "publisher"}]}, "item_12_text_63": {"attribute_name": "学位授与年度", "attribute_value_mlt": [{"subitem_text_value": "2010"}]}, "item_access_right": {"attribute_name": "アクセス権", "attribute_value_mlt": [{"subitem_access_right": "open access", "subitem_access_right_uri": "http://purl.org/coar/access_right/c_abf2"}]}, "item_creator": {"attribute_name": "著者", "attribute_type": "creator", "attribute_value_mlt": [{"creatorNames": [{"creatorName": "坂井, 誠", "creatorNameLang": "ja"}], "nameIdentifiers": [{"nameIdentifier": "39070", "nameIdentifierScheme": "WEKO"}]}, {"creatorNames": [{"creatorName": "SAKAI, Makoto", "creatorNameLang": "en"}], "nameIdentifiers": [{"nameIdentifier": "39071", "nameIdentifierScheme": "WEKO"}]}]}, "item_files": {"attribute_name": "ファイル情報", "attribute_type": "file", "attribute_value_mlt": [{"accessrole": "open_date", "date": [{"dateType": "Available", "dateValue": "2018-02-20"}], "displaytype": "detail", "download_preview_message": "", "file_order": 0, "filename": "k8967.pdf", "filesize": [{"value": "839.7 kB"}], "format": "application/pdf", "future_date_message": "", "is_thumbnail": false, "licensetype": "license_note", "mimetype": "application/pdf", "size": 839700.0, "url": {"label": "k8967.pdf", "objectType": "fulltext", "url": "https://nagoya.repo.nii.ac.jp/record/12410/files/k8967.pdf"}, "version_id": "190da1e1-7248-42a1-a49f-50ed4b583f10"}]}, "item_language": {"attribute_name": "言語", "attribute_value_mlt": [{"subitem_language": "eng"}]}, "item_resource_type": {"attribute_name": "資源タイプ", "attribute_value_mlt": [{"resourcetype": "doctoral thesis", "resourceuri": "http://purl.org/coar/resource_type/c_db06"}]}, "item_title": "Acoustic Feature Transformation Based on Generalized Criteria for Speech Recognition", "item_titles": {"attribute_name": "タイトル", "attribute_value_mlt": [{"subitem_title": "Acoustic Feature Transformation Based on Generalized Criteria for Speech Recognition", "subitem_title_language": "en"}]}, "item_type_id": "12", "owner": "1", "path": ["734"], "permalink_uri": "http://hdl.handle.net/2237/14293", "pubdate": {"attribute_name": "PubDate", "attribute_value": "2010-10-27"}, "publish_date": "2010-10-27", "publish_status": "0", "recid": "12410", "relation": {}, "relation_version_is_last": true, "title": ["Acoustic Feature Transformation Based on Generalized Criteria for Speech Recognition"], "weko_shared_id": -1}
Acoustic Feature Transformation Based on Generalized Criteria for Speech Recognition
http://hdl.handle.net/2237/14293
http://hdl.handle.net/2237/14293f7fd85ba-7bd2-4c97-ae7f-8136c1e2a149
名前 / ファイル | ライセンス | アクション |
---|---|---|
![]() |
|
Item type | 学位論文 / Thesis or Dissertation(1) | |||||
---|---|---|---|---|---|---|
公開日 | 2010-10-27 | |||||
タイトル | ||||||
タイトル | Acoustic Feature Transformation Based on Generalized Criteria for Speech Recognition | |||||
言語 | en | |||||
その他のタイトル | ||||||
その他のタイトル | 音声認識における音響特徴変換の最適化基準の一般化に関する研究 | |||||
言語 | ja | |||||
著者 |
坂井, 誠
× 坂井, 誠× SAKAI, Makoto |
|||||
アクセス権 | ||||||
アクセス権 | open access | |||||
アクセス権URI | http://purl.org/coar/access_right/c_abf2 | |||||
抄録 | ||||||
内容記述 | This thesis deals with acoustic feature transformations in automatic speech recognition to improve basic performance of a speech recognizer. The aim of acoustic feature transformations is to reduce dimensionality of long-term speech features without losing discriminative information among the different phonetic classes.<br/>First, we focus on optimizing acoustic feature transformations using criteria with which to maximize the ratio of between-class scatter to within-class scatter. This approach is based on a family of functions of scatter or covariance matrices, which is frequently used in practice. Typical methods in this approach include linear discriminant analysis (LDA), heteroscedastic linear discriminant analysis (HLDA), and heteroscedastic discriminant analysis (HDA). Although LDA, HLDA and HDA are the most widely used in speech recognition, the connections between them have been disregarded so far. By developing a unified mathematical framework, close relationships between them are identified and analyzed in detail. The framework termed power LDA (PLDA) can describe various criteria by varying its control parameter. PLDA includes LDA, HLDA and HDA as special cases. In order to determine a sub-optimal control parameter automatically, a control parameter selection method is also provided.<br/>The effectiveness of the combinations of acoustic feature transformations and discriminative training techniques of acoustic models is investigated and additional performance improvement is obtained. Unfortunately, the transformation methods mentioned above may result in an unexpected dimensionality reduction if the data in a certain class consist of several clusters, because they implicitly assume that data are generated from a single Gaussian distribution. This study provides extensions of HDA and PLDA to deal with class distributions with several clusters.<br/>Second, we focus attention on acoustic feature transformations which minimize a kind of classification error between different phonetic classes. As the performance of speech recognition systems generally correlates strongly with the classification accuracy of features, the features should have the power to discriminate between different classes. The existing methods for this approach attempt to minimize the average classification error between different classes. Although minimizing the average classification error suppresses total classification error, it cannot prevent the occurrence of considerable overlaps between distributions of some different classes with low frequencies, which is critical for speech recognition because there may be class pairs that have little or no discriminative information on each other. Instead of the average classification error, minimization methods of maximum classification error are proposed herewith so as to avoid considerable error between different classes. In addition, interpolation methods that minimize the maximization error while minimizing the average classification error are also proposed and achieved the best results. | |||||
言語 | en | |||||
内容記述タイプ | Abstract | |||||
内容記述 | ||||||
内容記述 | 名古屋大学博士学位論文 学位の種類:博士(情報科学)(課程) 学位授与年月日:平成22年9月30日 | |||||
言語 | ja | |||||
内容記述タイプ | Other | |||||
言語 | ||||||
言語 | eng | |||||
資源タイプ | ||||||
資源 | http://purl.org/coar/resource_type/c_db06 | |||||
タイプ | doctoral thesis | |||||
書誌情報 |
発行日 2010-09-30 |
|||||
学位名 | ||||||
言語 | ja | |||||
学位名 | 博士(情報科学) | |||||
学位授与機関 | ||||||
学位授与機関識別子Scheme | kakenhi | |||||
学位授与機関識別子 | 13901 | |||||
言語 | ja | |||||
学位授与機関名 | 名古屋大学 | |||||
言語 | en | |||||
学位授与機関名 | Nagoya University | |||||
学位授与年度 | ||||||
学位授与年度 | 2010 | |||||
学位授与年月日 | ||||||
学位授与年月日 | 2010-09-30 | |||||
学位授与番号 | ||||||
学位授与番号 | 甲第8967号 | |||||
著者版フラグ | ||||||
値 | publisher | |||||
URI | ||||||
識別子 | http://hdl.handle.net/2237/14293 | |||||
識別子タイプ | HDL |