Evaluation of Financial Data Processing Life Cycle for Risk Prediction: A Survey

M.V. Narayana; T. Sumallika; M. Vijaya Sudha; P.V.M. Raju

doi:10.14445/22312803/ IJCTT-V72I12P111

Research Article | Open Access | Download PDF

Volume 72 | Issue 12 | Year 2024 | Article Id. IJCTT-V72I12P111 | DOI : https://doi.org/10.14445/22312803/IJCTT-V72I12P111

Evaluation of Financial Data Processing Life Cycle for Risk Prediction: A Survey

M.V. Narayana, T. Sumallika, M. Vijaya Sudha, P.V.M. Raju

Received	Revised	Accepted	Published
31 Oct 2024	26 Nov 2024	11 Dec 2024	30 Dec 2024

Citation :

M.V. Narayana, T. Sumallika, M. Vijaya Sudha, P.V.M. Raju, "Evaluation of Financial Data Processing Life Cycle for Risk Prediction: A Survey," International Journal of Computer Trends and Technology (IJCTT), vol. 72, no. 12, pp. 89-99, 2024. Crossref, https://doi.org/10.14445/22312803/ IJCTT-V72I12P111

Abstract

Financial risk analysis is integral to financial planning and investment at organizational and personal levels. Due to the higher fluctuation of the financial trend, many inverters consider the risk prediction strategy during the investment portfolio generation. The risk prediction for financial assets is highly challenging due to the dependency of financial trends on various technical and non-technical factors. Hence, the use of computer-aided processes is becoming popular for risk prediction. Recently, with the enhancements in machine learning algorithms, the risk prediction processes have improved the accuracy of the prediction. These algorithms have two phases: training the model and deploying the models to predict. Nonetheless, the available machine learning algorithms for risk prediction have many limitations. The limitations primarily concern the correctness of the data to be deployed for building the predictive model for prediction as these data are collected from various sources, sometimes with human interventions, and are prone to insufficient and incorrectness. Hence, the frameworks or the processes for financial predictions must perform an additional step, such as data pre-processing, and then further perform the actual task, risk predictions. In the recent past, a good number of research works have aimed to predict financial risks with higher accuracy by designing a complete life cycle of the data for financial predictions, starting from data pre-processing to the conclusion of risk analysis. Nevertheless, these works are criticized for not performing the prediction task with the best possible accuracy and compromising on the time complexity, as time complexity can be a critical measure of performance in financial risk analysis. Hence, this work aims to analyze the various strategies and works for data pre-processing and predictions on financial data. This work finally contributes to the research domain by analyzing the strategies mathematically, algorithmically and result wise to identify the unsolved challenges in this domain.

Keywords

Computational modelling, Data analytics, Data collection, Machine Learning, Risk prediction.

References

[1] Natalia Yerashenia, and Alexander Bolotov, “Computational Modelling for Bankruptcy Prediction: Semantic Data Analysis Integrating Graph Database and Financial Ontology,” 2019 IEEE 21st Conference on Business Informatics (CBI), Moscow, Russia, vol. 1, pp. 84-93, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[2] H. Son et al., “Data Analytic Approach for Bankruptcy Prediction,” Expert Systems with Applications, vol. 138, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[3] Hafiz A. Alaka et al., “Systematic Review of Bankruptcy Prediction Models: Towards A Framework for Tool Selection,” Expert Systems with Applications, vol. 94, pp. 164-184, 2018.
[CrossRef] [Google Scholar] [Publisher Link]
[4] Yuji Roh, Geon Heo, and Steven Euijong Whang, “A Survey on Data Collection for Machine Learning: A Big Data-Ai Integration Perspective,” IEEE Transactions on Knowledge and Data Engineering, vol. 33, no. 4, pp.1328-1347, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[5] Tao Huang et al., “Promises and Challenges of Big Data Computing in Health Sciences,” Big Data Research, vol. 2, no. 1, pp. 2-11, 2015.
[CrossRef] [Google Scholar] [Publisher Link]
[6] TA Lin, “Systematic Risk Measurement of Internet Finance[J],” Statistics and Decision, vol. 35, no. 07, pp. 158-161, 2019.
[7] MA Li, and FAN Wei, “Where did the Liquidity Released by the Central Bank Go?–An Empirical Test Based on Micro-level Data[J],” Modem Economic Science, vol. 41, no. 03, pp. 39-48, 2019.
[8] Li Yuanyuan, Cui Chenchen, and Liu Siyu, “Financial Ecological Environment Enterprise Risk Commitment and Innovation Efficiency An Empirical Analysis of Manufacturing Based on Panel VAR,” Industrial Technology and Economy, vol. 38, no. 7, pp. 76-87, 2019.
[Google Scholar] [Publisher Link]
[9] Yang Songling et al., “Financialization of Entity Enterprises Analyst Coverage and Internal Innovation Driving Force[J],” Joumal of Management Science, vol. 32, no. 2, pp. 3-18, 2019.
[Google Scholar]
[10] Li Hua, Zhao Shuying, and Sun Qiubai, “Construction and Analysis of Financial Security Indicators Evaluation System Based on the Weighted Principal Component Distance Clustering[J],” Mathematics in Practice and Theory, vol. 48, no. 01, pp. 90-102, 2018.
[Google Scholar]
[11] BAI Xue and NIU Feng, “The Measurement Test and Regulation of the Systemic Risk Contribution in Financial Institutions[J],” Journal of Shanxi Finance and Economics University, vol. 40, no. 12, pp. 45-59, 2018.
[12] YE Li and WANG Yuanzhe, “CHEN Yongyong Study on the Risk Spillover Effect between Chinese Financial Institutions[J],” Statistics & Information Forum, vol. 34, no. 03, pp. 54-63, 2019.
[13] Zhu Xiao-qian et al., “An Indicator of Conditional Probability of Crisis for Systemic Risk Measurement[J],” Chinese Journal of Management Science, vol. 26, no. 6, pp. 1-7, 2018.
[CrossRef] [Google Scholar] [Publisher Link]
[14] Liu Zhiyang, “Systemic Risk Contributions?–Evidence from Panel Variable Coefficient Model[J],” Modem Economic Science, vol. 41, no. 03, pp. 49-60, 2019.
[15] Documentation, Gapminder, 2018. [Online]. Available: http://www.gapminder.org/downloads/documentation/gd003.
[16] Stanislav I. Koval, “Data Preparation for Neural Network Data Analysis,” 2018 IEEE Conference of Russian Young Researchers in Electrical and Electronic Engineering (EIConRus), Moscow and St. Petersburg, Russia, pp. 898-901, 2018.
[CrossRef] [Google Scholar] [Publisher Link]
[17] Francesco Antoniazzi, and Fabio Viola, “RDF Graph Visualization Tools: A Survey,” 2018 23rd Conference of Open Innovations Association (FRUCT), Bologna, Italy, pp. 25-36, 2018.
[CrossRef] [Google Scholar] [Publisher Link]
[18] L. Medsker, “Design and Development of Hybrid Neural Network and Expert Systems,” Proceedings of 1994 IEEE International Conference on Neural Networks (ICNN’94), Orlando, FL, USA, vol. 3, pp. 1470-1474, 1994.
[CrossRef] [Google Scholar] [Publisher Link]
[19] Paulo Gil, Hugo Martins, and Fábio Januário, “Outliers Detection Methods in Wireless Sensor Networks,” Artificial Intelligence Review, vol. 52, pp. 2411-2436, 2018.
[CrossRef] [Google Scholar] [Publisher Link]
[20] Vic Barnett, and Toby Lewis, Outliers in Statistical Data, Wiley, New York, USA, 3rd ed., 1994.
[Google Scholar] [Publisher Link]
[21] V.L. Brailovsky, “An Approach to Outlier Detection Based on Bayesian Probabilistic Model,” Proceedings of 13th International Conference on Pattern Recognition, Vienna, Austria, vol. 2, pp. 70-74, 1996.
[CrossRef] [Google Scholar] [Publisher Link]
[22] Hiromasa Kaneko, “Automatic Outlier Sample Detection Based on Regression Analysis and Repeated Ensemble Learning,” Chemometrics and Intelligent Laboratory Systems, vol. 177, pp. 74-82, 2018.
[CrossRef] [Google Scholar] [Publisher Link]
[23] Edwin M. Knox, and Raymond T. Ng, “Algorithms for Mining Distance-Based Outliers in Large Datasets,” Proceedings of the 24th VLDB Conference, New York, USA, pp. 392-403, 1998.
[Google Scholar] [Publisher Link]
[24] Markus M. Breunig et al., “LOF: Identifying Density-Based Local Outliers,” Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, New York, United States, pp. 93-104, 2000.
[CrossRef] [Google Scholar] [Publisher Link]
[25] Sharmila Subudhi, and Suvasini Panigrahi, “Use of Optimized Fuzzy C-Means Clustering and Supervised Classifiers for Automobile Insurance Fraud Detection,” Journal of King Saud University - Computer and Information Sciences, vol. 32, no. 5, pp. 568-575, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[26] Matous Cejnek, and Ivo Bukovsky, “Concept Drift Robust Adaptive Novelty Detection for Data Streams,” Neurocomputing, vol. 309, pp. 46-53, 2018.
[CrossRef] [Google Scholar] [Publisher Link]
[27] Christian Gouriéroux, ARCH Models and Financial Applications, New York, USA, Springer, 1st ed., 1997.
[CrossRef] [Google Scholar] [Publisher Link]
[28] Robert F. Sproull, “Refinements to Nearest-Neighbor Searching in K -Dimensional Trees,” Algorithmica, vol. 6, pp. 579-589, 1991.
[CrossRef] [Google Scholar] [Publisher Link]
[29] Tian Zhang, Raghu Ramakrishnan, and Miron Livny, “BIRCH: An Efficient Data Clustering Method for Very Large Databases,” ACM SIGMOD Record, vol. 25, no. 2, pp. 103-114, 1996.
[CrossRef] [Google Scholar] [Publisher Link]
[30] S.V. Stehman, “Selecting and Interpreting Measures of Thematic Classification Accuracy,” Remote Sensing of Environment, vol. 62, no. 1, pp. 77-89, 1997.
[CrossRef] [Google Scholar] [Publisher Link]