Detecting the Probability of Fraud in Interim Financial Statements Using Machine Learning Models: Do Correla-tion-Based Analysis and Principal Component Analysis for Dimensionality Reduction Matter?

Document Type : Original Article

Author

Accounting Department, Faculty of Business, Alexandria University, Alexandria, Egypt Accounting and Information Technology Department, Faculty of International Business and Humanities, Egypt-Japan University of Science and Technology (EJUST), New Borg Al-Arab, Alexandria, Egypt

Abstract

This study compares different machine learning (ML) models, datasets, and dimensionality reduction techniques to determine their effectiveness in detecting the probability of interim financial statements fraud (FSF). Using the design science research (DSR) approach, the study adopts a quantitative approach with a set of secondary data from the financial reports published by non-financial firms listed on the Egyptian Stock Exchange from 2015 to 2022. The research used a set of financial features compromising ratios reflecting the firm’s leverage, profitability, liquidity, and efficiency. Indicators of fraud are based on the Beneish M-score model that demonstrates the possibility of reporting earning manipulations. The findings reveal that the Random Forest classifier outperforms other classifiers, especially with the oversampling dataset after preprocessing using the correlation-based dimensionality reduction method. This study aims to benefit investors, stakeholders, auditors, regulatory bodies, fraud examiners, and academics who pay precise attention to creating new, better methods to detect the probability of FSF. This study introduces novel ML models and dimensionality reduction techniques that have not been previously applied to detect the probability of FSF in an emerging context. The research provides unique insights and evidence on the most effective dimensionality reduction techniques for achieving the best detection results. Additionally, the study introduces innovative solutions to the data imbalance problem. Therefore, the results can enable regulatory bodies and practitioners to detect managerial opportunistic behaviors more accurately in a timely manner and provide a foundation for further academic research in the field.

Keywords