3
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Feature Engineering and Resampling Strategies for Fund Transfer Fraud With Limited Transaction Data and a Time-Inhomogeneous Modi Operandi

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Related collections

          Most cited references45

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          SMOTE for high-dimensional class-imbalanced data

          Background Classification using class-imbalanced data is biased in favor of the majority class. The bias is even larger for high-dimensional data, where the number of variables greatly exceeds the number of samples. The problem can be attenuated by undersampling or oversampling, which produce class-balanced data. Generally undersampling is helpful, while random oversampling is not. Synthetic Minority Oversampling TEchnique (SMOTE) is a very popular oversampling method that was proposed to improve random oversampling but its behavior on high-dimensional data has not been thoroughly investigated. In this paper we investigate the properties of SMOTE from a theoretical and empirical point of view, using simulated and real high-dimensional data. Results While in most cases SMOTE seems beneficial with low-dimensional data, it does not attenuate the bias towards the classification in the majority class for most classifiers when data are high-dimensional, and it is less effective than random undersampling. SMOTE is beneficial for k-NN classifiers for high-dimensional data if the number of variables is reduced performing some type of variable selection; we explain why, otherwise, the k-NN classification is biased towards the minority class. Furthermore, we show that on high-dimensional data SMOTE does not change the class-specific mean values while it decreases the data variability and it introduces correlation between samples. We explain how our findings impact the class-prediction for high-dimensional data. Conclusions In practice, in the high-dimensional setting only k-NN classifiers based on the Euclidean distance seem to benefit substantially from the use of SMOTE, provided that variable selection is performed before using SMOTE; the benefit is larger if more neighbors are used. SMOTE for k-NN without variable selection should not be used, because it strongly biases the classification towards the minority class.
            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            Statistical Fraud Detection: A ReviewCommentCommentRejoinder

              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              A unifying view on dataset shift in classification

                Bookmark

                Author and article information

                Contributors
                Journal
                IEEE Access
                IEEE Access
                Institute of Electrical and Electronics Engineers (IEEE)
                2169-3536
                2022
                2022
                : 10
                : 86101-86116
                Affiliations
                [1 ]Institute of Finance, National Yang Ming Chiao Tung University, Hsinchu, Taiwan
                [2 ]Department of Information Management and Finance, National Yang Ming Chiao Tung University, Hsinchu, Taiwan
                [3 ]College of Artificial Intelligence, Yango University, Fuzhou, China
                [4 ]Institute of Computer Science and Engineering, National Yang Ming Chiao Tung University, Hsinchu, Taiwan
                [5 ]Department of Information Engineering and Computer Science, Feng Chia University, Taichung, Taiwan
                [6 ]Department of Information and Finance Management, National Taipei University of Technology, Taipei, Taiwan
                Article
                10.1109/ACCESS.2022.3199425
                5e172492-c685-41ca-851f-fc452b6bb503
                © 2022

                https://creativecommons.org/licenses/by/4.0/legalcode

                History

                Comments

                Comment on this article