An Efficient Novel Data Mining Approach to Calculate Outlier from Large Dataset Using Advance PCA.
Abstract
Outlier detection is an important term in the field of data mining. There are many detection techniques out of which most outlier detection methods are implemented in batch mode means they work on small datasets. Thus such methods cannot be easily used for large-scale problems without computation and memory requirements. Applications like intrusion or credit card fraud detection requires powerful and efficient framework to identify outlier data instances. In this paper, we propose an Advance Principal Component Analysis algorithm which aims at detecting the presence of outliers from a large amount of data via an online updating technique. Like The previous principal component analysis (PCA)-based approaches, we do not store the entire data matrix or covariance matrix, and thus this approach can be useful for the online or large-scale problems. By oversampling and extracting the principal direction of the data, the proposed advance PCA allows determining the anomaly of the target instance according to the variation of the resulting dominant eigenvector.
Keywords: Anomaly Detection, Oversampling, Online updating, PCA.