An Efficient Novel Data Mining Approach to Calculate Outlier from Large Dataset Using Advance PCA.

Priyanka R Patil

Authors

Priyanka R Patil Pune University, Department of Computer Engineering, Sandip Foundation, Nashik, India

Abstract

Outlier detection is an important term in the field of data mining. There are many detection techniques out of which most outlier detection methods are implemented in batch mode means they work on small datasets. Thus such methods cannot be easily used for large-scale problems without computation and memory requirements. Applications like intrusion or credit card fraud detection requires powerful and efficient framework to identify outlier data instances. In this paper, we propose an Advance Principal Component Analysis algorithm which aims at detecting the presence of outliers from a large amount of data via an online updating technique. Like The previous principal component analysis (PCA)-based approaches, we do not store the entire data matrix or covariance matrix, and thus this approach can be useful for the online or large-scale problems. By oversampling and extracting the principal direction of the data, the proposed advance PCA allows determining the anomaly of the target instance according to the variation of the resulting dominant eigenvector.

Keywords: Anomaly Detection, Oversampling, Online updating, PCA.

An Efficient Novel Data Mining Approach to Calculate Outlier from Large Dataset Using Advance PCA.

Authors

Abstract

Published

How to Cite

Issue

Section

Most read articles by the same author(s)

Make a Submission

Browse