Predicting the Probability of Exceeding Critical System Thresholds
Abstract
In this paper we show how regression modelling can be combined with a special kind of data transformation technique that improves model precision and produces several “preliminary” estimates of the target value. These preliminary estimates can be used for interval estimates of the target value as well as for predicting the probability that it has or will exceed arbitrary predefined thresholds. Our approach can be combined with various regression models and applied in many domains that need to estimate the probability of system malfunctions or other hazardous states brought about by system variables exceeding critical safety thresholds. We rigorously derive the formulas for the probability of crossing an upper bound and a lower bound both separately (one-sided intervals) and together (a two-sided interval), and verify the approach experimentally on a real dataset from the electric power industry.
Problems in programming 2018; 2-3: 189-196
Keywords
Full Text:
PDFReferences
Hastie T., Tibshirani R., Friedman J. The elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition 2009, Springer. P. 463-470, 605-622. http://web.stanford.edu/~hastie/Papers/ESLII.pdf
Ian H. Witten, Eibe Frank: Data Mining: Practical Machine Learning Tools and Techniques, Second Edition, Elsevier. P. 315-334, 414-418.
Peter Krammer, Marcel Kvassay, Ladislav Hluchý: Improved regression method with interval estimation. In ICNC-FSKD 2017: 2017 13th international conference on natural computation, fuzzy systems and knowledge discovery. - Guilin, China: IEEE, 2017. P. 2402-2408.
https://doi.org/10.1109/FSKD.2017.8393134
Jain Anil K., Robert P. W. Duin, Jianchang Mao: Statistical Pattern Recognition: A Review. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2000. P. 4-37.
https://doi.org/10.1109/34.824819
Dietterich T.G. An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization, Machine Learning 2000, 40(2). P. 139-157.
https://doi.org/10.1023/A:1007607513941
UCI Machine Learning Repository: Energy Efficiency, Center for Machine Learning and Intelligent Systems, https://archive.ics.uci.edu/ml/datasets/Energy+efficiency
K. Krishnamoorthy: Statistical Tolerance Regions: Theory, Applications, and Computation. 2009. John Wiley and Sons. P. 1-6.
https://doi.org/10.1002/9780470473900
Liu Y., Yao X., Higuchi T. Evolutionary Ensembles with Negative Correlation Learning. IEEE Transactions on Evolutionary Computation. 2000. P. 380-387.
https://doi.org/10.1109/4235.887237
CHO Sung-Bae, and Jin H. KIM: Multiple Network Fusion Using Fuzzy Logic. IEEE Transactions on Neural Networks. 1995, 6(2). P. 497-501.
https://doi.org/10.1109/72.363487
Chandra Arjun, and Xin Yao: Evolving Hybrid Ensembles of Learning Machines for Better Generalisation, Neurocomputing. 2006. 69(7-9). P. 686-700.
https://doi.org/10.1016/j.neucom.2005.12.014
Krammer Peter, Habala Ondrej, Hluchý Ladislav. Transformation regression technique for data mining. In IEEE International Conference on Intelligent Engineering Systems. 2016, vol., art. no. 7555134. P. 273-277.
https://doi.org/10.1109/INES.2016.7555134
Prasanna Sahoo: Probability and Mathematical Statistics, University of Louisville. 2013. P. 497-584. http://www.math.louisville.edu/~pksaho01/teaching/Math662TB-09S.pdf
Krishnamoorthy K. Statistical Tolerance Regions: Theory, Applications, and Computation, 2009, John Wiley and Sons. P. 1-6.
https://doi.org/10.1002/9780470473900
Matlab, Statistics and Machine Learning Toolbox Functions, https://www.mathworks.com/help/stats/functionlist-alpha.html
DOI: https://doi.org/10.15407/pp2018.02.189
Refbacks
- There are currently no refbacks.