トピックを考慮した大規模文書情報源からのレコード抽出

張, 建偉; ZHANG, Jianwei; 石川, 佳治; ISHIKAWA, Yoshiharu; 北川, 博之; KITAGAWA, Hiroyuki

WEKO3

lat lon distance

[[sub_check.contents]]

[[sub_radio.contents]]

Field does not validate

[[sub_attr.contents]]　

インデックスツリー

アイテム

{"_buckets": {"deposit": "551db7f5-0cc8-471f-915d-f6d6ffef11a1"}, "_deposit": {"id": "7301", "owners": [], "pid": {"revision_id": 0, "type": "depid", "value": "7301"}, "status": "published"}, "_oai": {"id": "oai:nagoya.repo.nii.ac.jp:00007301", "sets": ["672"]}, "author_link": ["20093", "20094", "20095", "20096", "20097", "20098"], "item_10_alternative_title_19": {"attribute_name": "その他のタイトル", "attribute_value_mlt": [{"subitem_alternative_title": "Record Extraction from Large-scale Text Resources Considering Topics", "subitem_alternative_title_language": "en"}]}, "item_10_biblio_info_6": {"attribute_name": "書誌情報", "attribute_value_mlt": [{"bibliographicIssueDates": {"bibliographicIssueDate": "2007-09", "bibliographicIssueDateType": "Issued"}, "bibliographicIssueNumber": "SIG 14(TOD 35)", "bibliographicPageEnd": "123", "bibliographicPageStart": "107", "bibliographicVolumeNumber": "48", "bibliographic_titles": [{"bibliographic_title": "情報処理学会論文誌", "bibliographic_titleLang": "ja"}]}]}, "item_10_description_4": {"attribute_name": "抄録", "attribute_value_mlt": [{"subitem_description": "近年、大量のテキスト文書からのレコード抽出の研究が行われている。レコード抽出には次の課題が存在する。第１に、大量の文書を情報抽出の対象とした場合に多大な処理コストがかかる。第２に、抽出されたレコードが、必ずしもユーザが興味あるトピックと合致しないことがある。これに対し本稿では、ユーザの意図に合った情報を効率よく抽出するためのレコード抽出手法を提案する。本手法では、効果的な抽出のために、ユーザの意図に適合した情報を含んでいる可能性の高い文書群を特定する。その特定した文書群を優先的に抽出処理に利用することで処理コストの削減を目指す。また、それらの文書群から内容の関連が深いレコードを抽出することで高い抽出精度を達成する。実験結果により、提案手法が抽出精度の低下を防ぎつつ、処理コストの削減を実現できることを示す。", "subitem_description_language": "ja", "subitem_description_type": "Abstract"}, {"subitem_description": "In recent years, the research on record extraction from a large number of text documents is becoming popular. However, there still exist some problems in record extraction. 1) When a large number of documents are used for the target of information extraction, the process usually becomes very time-consuming. 2) It is also likely that extracted records may not pertain to the user’s interest on the aspect of the topic. To address these problems, in this paper we propose a method for efficiently extracting those records whose topics are relevant to the user’s interest. To improve the efficiency of the information extraction system, our method identifies documents from which useful records are probably extracted. Those selected documents are first processed in order to reduce processing cost. Moreover, from these documents user-desired records are apt to be extracted so that high extraction accuracy is obtained. Our experiments show that our system reduces the processing cost with achieving high extraction accuracy.", "subitem_description_language": "en", "subitem_description_type": "Abstract"}]}, "item_10_identifier_60": {"attribute_name": "URI", "attribute_value_mlt": [{"subitem_identifier_type": "HDL", "subitem_identifier_uri": "http://hdl.handle.net/2237/8981"}]}, "item_10_publisher_32": {"attribute_name": "出版者", "attribute_value_mlt": [{"subitem_publisher": "情報処理学会", "subitem_publisher_language": "ja"}]}, "item_10_rights_12": {"attribute_name": "権利", "attribute_value_mlt": [{"subitem_rights": "ここに掲載した著作物の利用に関する注意 本著作物の著作権は（社）情報処理学会に帰属します。本著作物は著作権者である情報処理学会の許可のもとに掲載するものです。ご利用に当たっては「著作権法」ならびに「情報処理学会倫理綱領」に従うことをお願いいたします。 Notice for the use of this material The copyright of this material is retained by the Information Processing Society of Japan (IPSJ). This material is published on this web site with the agreement of the author (s) and the IPSJ. Please be complied with Copyright Law of Japan and the Code of Ethics of the IPSJ if any users wish to reproduce, make derivative work, distribute or make available to the public any part or whole thereof. All Rights Reserved, Copyright (C) Information Processing Society of Japan. Comments are welcome. Mail to address: 　editj\u003cat\u003eipsj.or.jp, please.", "subitem_rights_language": "ja"}]}, "item_10_select_15": {"attribute_name": "著者版フラグ", "attribute_value_mlt": [{"subitem_select_item": "publisher"}]}, "item_10_source_id_7": {"attribute_name": "ISSN", "attribute_value_mlt": [{"subitem_source_identifier": "0387-5806", "subitem_source_identifier_type": "PISSN"}]}, "item_10_text_14": {"attribute_name": "フォーマット", "attribute_value_mlt": [{"subitem_text_value": "application/pdf"}]}, "item_1615787544753": {"attribute_name": "出版タイプ", "attribute_value_mlt": [{"subitem_version_resource": "http://purl.org/coar/version/c_970fb48d4fbd8a85", "subitem_version_type": "VoR"}]}, "item_access_right": {"attribute_name": "アクセス権", "attribute_value_mlt": [{"subitem_access_right": "open access", "subitem_access_right_uri": "http://purl.org/coar/access_right/c_abf2"}]}, "item_creator": {"attribute_name": "著者", "attribute_type": "creator", "attribute_value_mlt": [{"creatorNames": [{"creatorName": "張, 建偉", "creatorNameLang": "ja"}], "nameIdentifiers": [{"nameIdentifier": "20093", "nameIdentifierScheme": "WEKO"}]}, {"creatorNames": [{"creatorName": "ZHANG, Jianwei", "creatorNameLang": "en"}], "nameIdentifiers": [{"nameIdentifier": "20094", "nameIdentifierScheme": "WEKO"}]}, {"creatorNames": [{"creatorName": "石川, 佳治", "creatorNameLang": "ja"}], "nameIdentifiers": [{"nameIdentifier": "20095", "nameIdentifierScheme": "WEKO"}]}, {"creatorNames": [{"creatorName": "ISHIKAWA, Yoshiharu", "creatorNameLang": "en"}], "nameIdentifiers": [{"nameIdentifier": "20096", "nameIdentifierScheme": "WEKO"}]}, {"creatorNames": [{"creatorName": "北川, 博之", "creatorNameLang": "ja"}], "nameIdentifiers": [{"nameIdentifier": "20097", "nameIdentifierScheme": "WEKO"}]}, {"creatorNames": [{"creatorName": "KITAGAWA, Hiroyuki", "creatorNameLang": "en"}], "nameIdentifiers": [{"nameIdentifier": "20098", "nameIdentifierScheme": "WEKO"}]}]}, "item_files": {"attribute_name": "ファイル情報", "attribute_type": "file", "attribute_value_mlt": [{"accessrole": "open_date", "date": [{"dateType": "Available", "dateValue": "2018-02-19"}], "displaytype": "detail", "download_preview_message": "", "file_order": 0, "filename": "2007-tod-zhang.pdf", "filesize": [{"value": "430.1 kB"}], "format": "application/pdf", "future_date_message": "", "is_thumbnail": false, "licensetype": "license_note", "mimetype": "application/pdf", "size": 430100.0, "url": {"label": "2007-tod-zhang.pdf", "objectType": "fulltext", "url": "https://nagoya.repo.nii.ac.jp/record/7301/files/2007-tod-zhang.pdf"}, "version_id": "a37793ad-decf-48d4-9dab-d2744e3430d3"}]}, "item_language": {"attribute_name": "言語", "attribute_value_mlt": [{"subitem_language": "jpn"}]}, "item_resource_type": {"attribute_name": "資源タイプ", "attribute_value_mlt": [{"resourcetype": "journal article", "resourceuri": "http://purl.org/coar/resource_type/c_6501"}]}, "item_title": "トピックを考慮した大規模文書情報源からのレコード抽出", "item_titles": {"attribute_name": "タイトル", "attribute_value_mlt": [{"subitem_title": "トピックを考慮した大規模文書情報源からのレコード抽出", "subitem_title_language": "ja"}]}, "item_type_id": "10", "owner": "1", "path": ["672"], "permalink_uri": "http://hdl.handle.net/2237/8981", "pubdate": {"attribute_name": "PubDate", "attribute_value": "2007-11-06"}, "publish_date": "2007-11-06", "publish_status": "0", "recid": "7301", "relation": {}, "relation_version_is_last": true, "title": ["トピックを考慮した大規模文書情報源からのレコード抽出"], "weko_shared_id": -1}

トピックを考慮した大規模文書情報源からのレコード抽出

http://hdl.handle.net/2237/8981

名前 / ファイル	ライセンス	アクション
2007-tod-zhang.pdf (430.1 kB)

Item type

学術雑誌論文 / Journal Article(1)

公開日

2007-11-06

タイトル

トピックを考慮した大規模文書情報源からのレコード抽出

言語

その他のタイトル

Record Extraction from Large-scale Text Resources Considering Topics

言語

著者

張, 建偉
ZHANG, Jianwei
石川, 佳治
ISHIKAWA, Yoshiharu
北川, 博之
KITAGAWA, Hiroyuki

アクセス権

open access

アクセス権URI

http://purl.org/coar/access_right/c_abf2

権利

言語

権利情報

ここに掲載した著作物の利用に関する注意本著作物の著作権は（社）情報処理学会に帰属します。本著作物は著作権者である情報処理学会の許可のもとに掲載するものです。ご利用に当たっては「著作権法」ならびに「情報処理学会倫理綱領」に従うことをお願いいたします。 Notice for the use of this material The copyright of this material is retained by the Information Processing Society of Japan (IPSJ). This material is published on this web site with the agreement of the author (s) and the IPSJ. Please be complied with Copyright Law of Japan and the Code of Ethics of the IPSJ if any users wish to reproduce, make derivative work, distribute or make available to the public any part or whole thereof. All Rights Reserved, Copyright (C) Information Processing Society of Japan. Comments are welcome. Mail to address: 　editj<at>ipsj.or.jp, please.

抄録

内容記述

近年、大量のテキスト文書からのレコード抽出の研究が行われている。レコード抽出には次の課題が存在する。第１に、大量の文書を情報抽出の対象とした場合に多大な処理コストがかかる。第２に、抽出されたレコードが、必ずしもユーザが興味あるトピックと合致しないことがある。これに対し本稿では、ユーザの意図に合った情報を効率よく抽出するためのレコード抽出手法を提案する。本手法では、効果的な抽出のために、ユーザの意図に適合した情報を含んでいる可能性の高い文書群を特定する。その特定した文書群を優先的に抽出処理に利用することで処理コストの削減を目指す。また、それらの文書群から内容の関連が深いレコードを抽出することで高い抽出精度を達成する。実験結果により、提案手法が抽出精度の低下を防ぎつつ、処理コストの削減を実現できることを示す。

言語

内容記述タイプ

Abstract

抄録

内容記述

In recent years, the research on record extraction from a large number of text documents is becoming popular. However, there still exist some problems in record extraction. 1) When a large number of documents are used for the target of information extraction, the process usually becomes very time-consuming. 2) It is also likely that extracted records may not pertain to the user’s interest on the aspect of the topic. To address these problems, in this paper we propose a method for efficiently extracting those records whose topics are relevant to the user’s interest. To improve the efficiency of the information extraction system, our method identifies documents from which useful records are probably extracted. Those selected documents are first processed in order to reduce processing cost. Moreover, from these documents user-desired records are apt to be extracted so that high extraction accuracy is obtained. Our experiments show that our system reduces the processing cost with achieving high extraction accuracy.

言語

内容記述タイプ

Abstract

出版者

言語

出版者

情報処理学会

言語

jpn

資源タイプ

資源タイプresource

http://purl.org/coar/resource_type/c_6501

タイプ

journal article

出版タイプ

VoR

出版タイプResource

http://purl.org/coar/version/c_970fb48d4fbd8a85

ISSN

収録物識別子タイプ

PISSN

収録物識別子

0387-5806

書誌情報

ja : 情報処理学会論文誌

巻 48, 号 SIG 14(TOD 35), p. 107-123, 発行日 2007-09

フォーマット

application/pdf

著者版フラグ

値

publisher

URI

識別子

http://hdl.handle.net/2237/8981

識別子タイプ

HDL

戻る

views

See details

	Views

Versions

Ver.1

2021-03-01 20:12:37.855795

Show All versions

Cite as

エクスポート

OAI-PMH

JPCOAR
DublinCore
DDI

Other Formats

JSON
BIBTEX

インデックスリンク

インデックスツリー

アイテム

トピックを考慮した大規模文書情報源からのレコード抽出

× 張, 建偉

× ZHANG, Jianwei

× 石川, 佳治

× ISHIKAWA, Yoshiharu

× 北川, 博之

× KITAGAWA, Hiroyuki

Versions

Share

Cite as

エクスポート