2024-03-29T06:11:40Z
https://nagoya.repo.nii.ac.jp/oai
oai:nagoya.repo.nii.ac.jp:00013116
2023-01-16T03:59:46Z
312:313:314
A Randomness Based Analysis on the Data Size Needed for Removing Deceptive Patterns
HARAGUCHI, Kazuya
YAGIURA, Mutsunori
BOROS, Endre
IBARAKI, Toshihide
open access
Copyright (C) 2008 IEICE
frequent/infrequent item sets
association rules
knowledge discovery
probabilistic analysis
We consider a data set in which each example is an n-dimensional Boolean vector labeled as true or false. A pattern is a co-occurrence of a particular value combination of a given subset of the variables. If a pattern appears frequently in the true examples and infrequently in the false examples, we consider it a good pattern. In this paper, we discuss the problem of determining the data size needed for removing "deceptive" good patterns; in a data set of a small size, many good patterns may appear superficially, simply by chance, independently of the underlying structure. Our hypothesis is that, in order to remove such deceptive good patterns, the data set should contain a greater number of examples than that at which a random data set contains few good patterns. We justify this hypothesis by computational studies. We also derive a theoretical upper bound on the needed data size in view of our hypothesis.
Institute of Electronics, Information and Communication Engineers
2008-03-01
eng
journal article
VoR
http://hdl.handle.net/2237/15011
https://nagoya.repo.nii.ac.jp/records/13116
http://www.ieice.org/jpn/trans_online/index.html
0916-8532
IEICE transactions on information and systems
E91-D
3
781
788
https://nagoya.repo.nii.ac.jp/record/13116/files/503.pdf
application/pdf
394.8 kB
2018-02-20