Adaptive Hierarchical Clustering and Batch-free Top-K Sequential Pattern Mining for Data Streams
ADAPTIVE HIERARCHICAL CLUSTERING AND BATCH-FREE TOP-K SEQUENTIAL
DOI:
https://doi.org/10.56042/jsir.v84i5.5439Keywords:
Inverted tree, Maximum pattern length, Minimum support, Prediction, SubsequencesAbstract
The Sequential Pattern Mining (SPM) is a challenging task in data streams due to huge memory and computational costs to meet accuracy in mined results. Sequential patterns mined from target stream in traditional batch-based processing results in pattern loss when the batches are processed independently, where the pattern frequency is determined local to the batch. However, if a pattern is frequent in the stream and its items appear in various batches, then this pattern never becomes frequent and hence requires pruning. To address this issue, the sequences are clustered by similarity using Adaptive Hierarchical Clustering (AHC) and Batch-Free Top-K Sequential Pattern Mining (BFTKSPM) algorithms proposed to mine approximate sequential patterns over data streams. The BFTKSPM algorithm targets data stream in a continuous and batch-free manner. The top-k sequential patterns are extracted from data streams and are maintained in an inverted tree structure. The experimental results of the proposed algorithm are carried out on benchmark datasets for data streams and it outperforms the existing batch-based methods in terms of execution time, memory, precision, recall, and F1-score.