Predicting the Probability of Exceeding Critical System Thresholds

P. Krammer, M. Kvassay, L. Hluchý

Abstract


In this paper we show how regression modelling can be combined with a special kind of data transformation technique that improves model precision and produces several “preliminary” estimates of the target value. These preliminary estimates can be used for interval estimates of the target value as well as for predicting the probability that it has or will exceed arbitrary predefined thresholds. Our approach can be combined with various regression models and applied in many domains that need to estimate the probability of system malfunctions or other hazardous states brought about by system variables exceeding critical safety thresholds. We rigorously derive the formulas for the probability of crossing an upper bound and a lower bound both separately (one-sided intervals) and together (a two-sided interval), and verify the approach experimentally on a real dataset from the electric power industry.

Problems in programming 2018; 2-3: 189-196 


Keywords


regression; data transformation; interval estimation; probability; statistical modelling

Full Text:

PDF

References


Hastie T., Tibshirani R., Friedman J. The elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition 2009, Springer. P. 463-470, 605-622. http://web.stanford.edu/~hastie/Papers/ESLII.pdf

Ian H. Witten, Eibe Frank: Data Mining: Practical Machine Learning Tools and Techniques, Second Edition, Elsevier. P. 315-334, 414-418.

Peter Krammer, Marcel Kvassay, Ladislav Hluchý: Improved regression method with interval estimation. In ICNC-FSKD 2017: 2017 13th international conference on natural computation, fuzzy systems and knowledge discovery. - Guilin, China: IEEE, 2017. P. 2402-2408.

https://doi.org/10.1109/FSKD.2017.8393134

Jain Anil K., Robert P. W. Duin, Jianchang Mao: Statistical Pattern Recognition: A Review. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2000. P. 4-37.

https://doi.org/10.1109/34.824819

Dietterich T.G. An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization, Machine Learning 2000, 40(2). P. 139-157.

https://doi.org/10.1023/A:1007607513941

UCI Machine Learning Repository: Energy Efficiency, Center for Machine Learning and Intelligent Systems, https://archive.ics.uci.edu/ml/datasets/Energy+efficiency

K. Krishnamoorthy: Statistical Tolerance Regions: Theory, Applications, and Computation. 2009. John Wiley and Sons. P. 1-6.

https://doi.org/10.1002/9780470473900

Liu Y., Yao X., Higuchi T. Evolutionary Ensembles with Negative Correlation Learning. IEEE Transactions on Evolutionary Computation. 2000. P. 380-387.

https://doi.org/10.1109/4235.887237

CHO Sung-Bae, and Jin H. KIM: Multiple Network Fusion Using Fuzzy Logic. IEEE Transactions on Neural Networks. 1995, 6(2). P. 497-501.

https://doi.org/10.1109/72.363487

Chandra Arjun, and Xin Yao: Evolving Hybrid Ensembles of Learning Machines for Better Generalisation, Neurocomputing. 2006. 69(7-9). P. 686-700.

https://doi.org/10.1016/j.neucom.2005.12.014

Krammer Peter, Habala Ondrej, Hluchý Ladislav. Transformation regression technique for data mining. In IEEE International Conference on Intelligent Engineering Systems. 2016, vol., art. no. 7555134. P. 273-277.

https://doi.org/10.1109/INES.2016.7555134

Prasanna Sahoo: Probability and Mathematical Statistics, University of Louisville. 2013. P. 497-584. http://www.math.louisville.edu/~pksaho01/teaching/Math662TB-09S.pdf

Krishnamoorthy K. Statistical Tolerance Regions: Theory, Applications, and Computation, 2009, John Wiley and Sons. P. 1-6.

https://doi.org/10.1002/9780470473900

Matlab, Statistics and Machine Learning Toolbox Functions, https://www.mathworks.com/help/stats/functionlist-alpha.html




DOI: https://doi.org/10.15407/pp2018.02.189

Refbacks

  • There are currently no refbacks.