Multiple Support for Large Sequence Databases by Mining Sequential Patterns
Abstract
Sequential pattern mining is an important model in data mining. Its mining algorithms discover all item sets in the data that satisfy the user-specified minimum support (minsup) and minimum confidence (mincon) constraints. Minsup controls the minimum number of data cases that a rule must cover. Mincon controls the analytical strength of the rule. Since only one minsup is used for the whole database, the model completely assumes that all items in the data are of the same nature and have similar frequencies in the data. In many applications, some data items appear frequently in the data, while others rarely appeared. If minsup is set too high, those rules that involve rare data items will not be found. To find rules that involve both frequent and rare items, minsup has to be set very low. This may affect combinational explosion because those frequent items will be associated with one another in all possible ways. This problem is called the rare item problem. This paper proposes to solve this problem. The technique allows the user to specify multiple minimum supports (MMS) to reflect the natures of the items and their mixed frequencies in the database. In data mining, different rules may need to satisfy different minimum supports depending on what items are in the database. Experiment results show that the technique is very effective.
Keywords: Minsup, Mincon