Predicting the Probability of Exceeding Critical System Thresholds

P. Krammer, M. Kvassay, L. Hluchý


In this paper we show how regression modelling can be combined with a special kind of data transformation technique that improves model precision and produces several “preliminary” estimates of the target value. These preliminary estimates can be used for interval estimates of the target value as well as for predicting the probability that it has or will exceed arbitrary predefined thresholds. Our approach can be combined with various regression models and applied in many domains that need to estimate the probability of system malfunctions or other hazardous states brought about by system variables exceeding critical safety thresholds. We rigorously derive the formulas for the probability of crossing an upper bound and a lower bound both separately (one-sided intervals) and together (a two-sided interval), and verify the approach experimentally on a real dataset from the electric power industry.

Problems in programming 2018; 2-3: 189-196 


regression; data transformation; interval estimation; probability; statistical modelling

Full Text:



Hastie T., Tibshirani R., Friedman J. The elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition 2009, Springer. P. 463-470, 605-622.

Ian H. Witten, Eibe Frank: Data Mining: Practical Machine Learning Tools and Techniques, Second Edition, Elsevier. P. 315-334, 414-418.

Peter Krammer, Marcel Kvassay, Ladislav Hluchý: Improved regression method with interval estimation. In ICNC-FSKD 2017: 2017 13th international conference on natural computation, fuzzy systems and knowledge discovery. - Guilin, China: IEEE, 2017. P. 2402-2408.

Jain Anil K., Robert P. W. Duin, Jianchang Mao: Statistical Pattern Recognition: A Review. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2000. P. 4-37.

Dietterich T.G. An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization, Machine Learning 2000, 40(2). P. 139-157.

UCI Machine Learning Repository: Energy Efficiency, Center for Machine Learning and Intelligent Systems,

K. Krishnamoorthy: Statistical Tolerance Regions: Theory, Applications, and Computation. 2009. John Wiley and Sons. P. 1-6.

Liu Y., Yao X., Higuchi T. Evolutionary Ensembles with Negative Correlation Learning. IEEE Transactions on Evolutionary Computation. 2000. P. 380-387.

CHO Sung-Bae, and Jin H. KIM: Multiple Network Fusion Using Fuzzy Logic. IEEE Transactions on Neural Networks. 1995, 6(2). P. 497-501.

Chandra Arjun, and Xin Yao: Evolving Hybrid Ensembles of Learning Machines for Better Generalisation, Neurocomputing. 2006. 69(7-9). P. 686-700.

Krammer Peter, Habala Ondrej, Hluchý Ladislav. Transformation regression technique for data mining. In IEEE International Conference on Intelligent Engineering Systems. 2016, vol., art. no. 7555134. P. 273-277.

Prasanna Sahoo: Probability and Mathematical Statistics, University of Louisville. 2013. P. 497-584.

Krishnamoorthy K. Statistical Tolerance Regions: Theory, Applications, and Computation, 2009, John Wiley and Sons. P. 1-6.

Matlab, Statistics and Machine Learning Toolbox Functions,



  • There are currently no refbacks.