Acoustic Feature Transformation Based on Generalized Criteria for Speech Recognition

坂井, 誠; SAKAI, Makoto

WEKO3

lat lon distance

[[sub_check.contents]]

[[sub_radio.contents]]

Field does not validate

[[sub_attr.contents]]　

インデックスツリー

アイテム

{"_buckets": {"deposit": "2c1e77f3-8f78-48d7-8303-06ec23165007"}, "_deposit": {"id": "12410", "owners": [], "pid": {"revision_id": 0, "type": "depid", "value": "12410"}, "status": "published"}, "_oai": {"id": "oai:nagoya.repo.nii.ac.jp:00012410", "sets": ["734"]}, "author_link": ["39070", "39071"], "item_12_alternative_title_19": {"attribute_name": "その他のタイトル", "attribute_value_mlt": [{"subitem_alternative_title": "音声認識における音響特徴変換の最適化基準の一般化に関する研究", "subitem_alternative_title_language": "ja"}]}, "item_12_biblio_info_6": {"attribute_name": "書誌情報", "attribute_value_mlt": [{"bibliographicIssueDates": {"bibliographicIssueDate": "2010-09-30", "bibliographicIssueDateType": "Issued"}}]}, "item_12_date_granted_64": {"attribute_name": "学位授与年月日", "attribute_value_mlt": [{"subitem_dategranted": "2010-09-30"}]}, "item_12_degree_grantor_62": {"attribute_name": "学位授与機関", "attribute_value_mlt": [{"subitem_degreegrantor": [{"subitem_degreegrantor_language": "ja", "subitem_degreegrantor_name": "名古屋大学"}, {"subitem_degreegrantor_language": "en", "subitem_degreegrantor_name": "Nagoya University"}], "subitem_degreegrantor_identifier": [{"subitem_degreegrantor_identifier_name": "13901", "subitem_degreegrantor_identifier_scheme": "kakenhi"}]}]}, "item_12_degree_name_61": {"attribute_name": "学位名", "attribute_value_mlt": [{"subitem_degreename": "博士(情報科学)", "subitem_degreename_language": "ja"}]}, "item_12_description_4": {"attribute_name": "抄録", "attribute_value_mlt": [{"subitem_description": "This thesis deals with acoustic feature transformations in automatic speech recognition to improve basic performance of a speech recognizer. The aim of acoustic feature transformations is to reduce dimensionality of long-term speech features without losing discriminative information among the different phonetic classes.\u003cbr/\u003eFirst, we focus on optimizing acoustic feature transformations using criteria with which to maximize the ratio of between-class scatter to within-class scatter. This approach is based on a family of functions of scatter or covariance matrices, which is frequently used in practice. Typical methods in this approach include linear discriminant analysis (LDA), heteroscedastic linear discriminant analysis (HLDA), and heteroscedastic discriminant analysis (HDA). Although LDA, HLDA and HDA are the most widely used in speech recognition, the connections between them have been disregarded so far. By developing a unified mathematical framework, close relationships between them are identified and analyzed in detail. The framework termed power LDA (PLDA) can describe various criteria by varying its control parameter. PLDA includes LDA, HLDA and HDA as special cases. In order to determine a sub-optimal control parameter automatically, a control parameter selection method is also provided.\u003cbr/\u003eThe effectiveness of the combinations of acoustic feature transformations and discriminative training techniques of acoustic models is investigated and additional performance improvement is obtained. Unfortunately, the transformation methods mentioned above may result in an unexpected dimensionality reduction if the data in a certain class consist of several clusters, because they implicitly assume that data are generated from a single Gaussian distribution. This study provides extensions of HDA and PLDA to deal with class distributions with several clusters.\u003cbr/\u003eSecond, we focus attention on acoustic feature transformations which minimize a kind of classification error between different phonetic classes. As the performance of speech recognition systems generally correlates strongly with the classification accuracy of features, the features should have the power to discriminate between different classes. The existing methods for this approach attempt to minimize the average classification error between different classes. Although minimizing the average classification error suppresses total classification error, it cannot prevent the occurrence of considerable overlaps between distributions of some different classes with low frequencies, which is critical for speech recognition because there may be class pairs that have little or no discriminative information on each other. Instead of the average classification error, minimization methods of maximum classification error are proposed herewith so as to avoid considerable error between different classes. In addition, interpolation methods that minimize the maximization error while minimizing the average classification error are also proposed and achieved the best results.", "subitem_description_language": "en", "subitem_description_type": "Abstract"}]}, "item_12_description_5": {"attribute_name": "内容記述", "attribute_value_mlt": [{"subitem_description": "名古屋大学博士学位論文 学位の種類:博士(情報科学)(課程) 学位授与年月日:平成22年9月30日", "subitem_description_language": "ja", "subitem_description_type": "Other"}]}, "item_12_dissertation_number_65": {"attribute_name": "学位授与番号", "attribute_value_mlt": [{"subitem_dissertationnumber": "甲第8967号"}]}, "item_12_identifier_60": {"attribute_name": "URI", "attribute_value_mlt": [{"subitem_identifier_type": "HDL", "subitem_identifier_uri": "http://hdl.handle.net/2237/14293"}]}, "item_12_select_15": {"attribute_name": "著者版フラグ", "attribute_value_mlt": [{"subitem_select_item": "publisher"}]}, "item_12_text_63": {"attribute_name": "学位授与年度", "attribute_value_mlt": [{"subitem_text_value": "2010"}]}, "item_access_right": {"attribute_name": "アクセス権", "attribute_value_mlt": [{"subitem_access_right": "open access", "subitem_access_right_uri": "http://purl.org/coar/access_right/c_abf2"}]}, "item_creator": {"attribute_name": "著者", "attribute_type": "creator", "attribute_value_mlt": [{"creatorNames": [{"creatorName": "坂井, 誠", "creatorNameLang": "ja"}], "nameIdentifiers": [{"nameIdentifier": "39070", "nameIdentifierScheme": "WEKO"}]}, {"creatorNames": [{"creatorName": "SAKAI, Makoto", "creatorNameLang": "en"}], "nameIdentifiers": [{"nameIdentifier": "39071", "nameIdentifierScheme": "WEKO"}]}]}, "item_files": {"attribute_name": "ファイル情報", "attribute_type": "file", "attribute_value_mlt": [{"accessrole": "open_date", "date": [{"dateType": "Available", "dateValue": "2018-02-20"}], "displaytype": "detail", "download_preview_message": "", "file_order": 0, "filename": "k8967.pdf", "filesize": [{"value": "839.7 kB"}], "format": "application/pdf", "future_date_message": "", "is_thumbnail": false, "licensetype": "license_note", "mimetype": "application/pdf", "size": 839700.0, "url": {"label": "k8967.pdf", "objectType": "fulltext", "url": "https://nagoya.repo.nii.ac.jp/record/12410/files/k8967.pdf"}, "version_id": "190da1e1-7248-42a1-a49f-50ed4b583f10"}]}, "item_language": {"attribute_name": "言語", "attribute_value_mlt": [{"subitem_language": "eng"}]}, "item_resource_type": {"attribute_name": "資源タイプ", "attribute_value_mlt": [{"resourcetype": "doctoral thesis", "resourceuri": "http://purl.org/coar/resource_type/c_db06"}]}, "item_title": "Acoustic Feature Transformation Based on Generalized Criteria for Speech Recognition", "item_titles": {"attribute_name": "タイトル", "attribute_value_mlt": [{"subitem_title": "Acoustic Feature Transformation Based on Generalized Criteria for Speech Recognition", "subitem_title_language": "en"}]}, "item_type_id": "12", "owner": "1", "path": ["734"], "permalink_uri": "http://hdl.handle.net/2237/14293", "pubdate": {"attribute_name": "PubDate", "attribute_value": "2010-10-27"}, "publish_date": "2010-10-27", "publish_status": "0", "recid": "12410", "relation": {}, "relation_version_is_last": true, "title": ["Acoustic Feature Transformation Based on Generalized Criteria for Speech Recognition"], "weko_shared_id": -1}

Acoustic Feature Transformation Based on Generalized Criteria for Speech Recognition

http://hdl.handle.net/2237/14293

名前 / ファイル	ライセンス	アクション
k8967.pdf (839.7 kB)

Item type

学位論文 / Thesis or Dissertation(1)

公開日

2010-10-27

タイトル

Acoustic Feature Transformation Based on Generalized Criteria for Speech Recognition

言語

その他のタイトル

音声認識における音響特徴変換の最適化基準の一般化に関する研究

言語

著者

坂井, 誠
SAKAI, Makoto

アクセス権

open access

アクセス権URI

http://purl.org/coar/access_right/c_abf2

抄録

内容記述

This thesis deals with acoustic feature transformations in automatic speech recognition to improve basic performance of a speech recognizer. The aim of acoustic feature transformations is to reduce dimensionality of long-term speech features without losing discriminative information among the different phonetic classes.<br/>First, we focus on optimizing acoustic feature transformations using criteria with which to maximize the ratio of between-class scatter to within-class scatter. This approach is based on a family of functions of scatter or covariance matrices, which is frequently used in practice. Typical methods in this approach include linear discriminant analysis (LDA), heteroscedastic linear discriminant analysis (HLDA), and heteroscedastic discriminant analysis (HDA). Although LDA, HLDA and HDA are the most widely used in speech recognition, the connections between them have been disregarded so far. By developing a unified mathematical framework, close relationships between them are identified and analyzed in detail. The framework termed power LDA (PLDA) can describe various criteria by varying its control parameter. PLDA includes LDA, HLDA and HDA as special cases. In order to determine a sub-optimal control parameter automatically, a control parameter selection method is also provided.<br/>The effectiveness of the combinations of acoustic feature transformations and discriminative training techniques of acoustic models is investigated and additional performance improvement is obtained. Unfortunately, the transformation methods mentioned above may result in an unexpected dimensionality reduction if the data in a certain class consist of several clusters, because they implicitly assume that data are generated from a single Gaussian distribution. This study provides extensions of HDA and PLDA to deal with class distributions with several clusters.<br/>Second, we focus attention on acoustic feature transformations which minimize a kind of classification error between different phonetic classes. As the performance of speech recognition systems generally correlates strongly with the classification accuracy of features, the features should have the power to discriminate between different classes. The existing methods for this approach attempt to minimize the average classification error between different classes. Although minimizing the average classification error suppresses total classification error, it cannot prevent the occurrence of considerable overlaps between distributions of some different classes with low frequencies, which is critical for speech recognition because there may be class pairs that have little or no discriminative information on each other. Instead of the average classification error, minimization methods of maximum classification error are proposed herewith so as to avoid considerable error between different classes. In addition, interpolation methods that minimize the maximization error while minimizing the average classification error are also proposed and achieved the best results.

言語

内容記述タイプ

Abstract

内容記述

名古屋大学博士学位論文学位の種類:博士(情報科学)(課程) 学位授与年月日:平成22年9月30日

言語

内容記述タイプ

Other

言語

eng

資源タイプ

資源

http://purl.org/coar/resource_type/c_db06

タイプ

doctoral thesis

書誌情報

発行日 2010-09-30

学位名

言語

学位名

博士(情報科学)

学位授与機関

学位授与機関識別子Scheme

kakenhi

学位授与機関識別子

13901

言語

学位授与機関名

名古屋大学

言語

学位授与機関名

Nagoya University

学位授与年度

2010

学位授与年月日

2010-09-30

学位授与番号

甲第8967号

著者版フラグ

値

publisher

URI

識別子

http://hdl.handle.net/2237/14293

識別子タイプ

HDL

戻る

views

See details

	Views

Versions

Ver.1

2021-03-01 18:53:58.668740

Show All versions

Cite as

エクスポート

OAI-PMH

JPCOAR
DublinCore
DDI

Other Formats

JSON
BIBTEX

インデックスリンク

インデックスツリー

アイテム

Acoustic Feature Transformation Based on Generalized Criteria for Speech Recognition

× 坂井, 誠

× SAKAI, Makoto

Versions

Share

Cite as

エクスポート