Application of machine learning in software engineering: an overview

O.H. Moroz, H.B. Moroz


Today, software is one of the main technologies contributing to the development of society. Therefore, its quality is a major requirement for both the global software industry and software engineering, which deals with all aspects of improving the quality and reliability of software products at all stages of their life cycle. To solve software engineering problems, the use of artificial intelligence methods is becoming increasingly relevant. The article presents a brief description of machine learning methods such as artificial neural networks, support vector machine, decision trees, inductive logic programming and others. Also, examples of the application of these methods to solve some problems of forecasting and quality assessment in software engineering are presented, recommendations for applying machine learning algorithms to solving problems of software engineering are given. The review will be useful by researchers and practitioners as a starting point, because it identifies important and promising areas of research. This will ultimately lead to more effective solving of software engineering problems, providing better, more reliable and cost effective software products.

Problems in programming 2019; 4: 92-110


software engineering; software; machine learning; neural networks; support vector machine; decision trees


Brooks F. (1987), "No silver bullet: essence and accidents of software engineering", IEEE Computer, Vol. 20 No.4, pp.10-19. CrossRef

Andon P.I., Koval G.I., Korotune T.M., Lavrischeva E.M. and Suslov V.Yu. (2007), The Fundamentals for Software Quality Engineering, 2-nd ed., К.: Akademperiodika, 672 p. (in Russian).

Lowry M. (1992), "Software engineering in the twenty first century", AI Magazine, Vol.14 No.3, pp.71-87.

Mostow J. (1985), "Special issue on artificial intelligence and software engineering", IEEE Trans. SE, Vol.11 No.ll, pp. 1253─1408. CrossRef

Partridge D. (1998), Artificial Intelligence and Software Engineering, AMACOM, 277 p.

Rich C. and Waters R.(1986), "Readings in Artificial Intelligence and Software Engineering", Morgan Kaufmann, 589 p.

Tsai J.J.P. and Weigert T. (1993), Knowledge-Based Software Development for Real-Time Distributed Systems, World Scientific Inc., Singapore, 236 p. CrossRef

Mitchell T. (1997), Machine Learning, McGraw-Hill, 414p.

Selfridge O. (1993), "The gardens of learning: a vision for AI", AI Magazine, Vol.14, N 2. P.36-48.

Quinlan J.R. (1999), "Some elements of machine learning", Proceedings of the 9th International Workshop on Inductive Logic Programming, Lecture Notes in Artificial Intelligence, Springer-Verlag, Vol. 1634. P. 15-18. CrossRef

Dietterich T. G. (1997), "Machine learning research: four current directions", AI Magazine. Vol. 18, N 4. P. 97-136.

Seeger M. (2001). Learning with labeled and unlabeled data (Technical Report). University of Edinburgh.

Zhu X., Ghahramani Z., and Lafferty J.D. (2003), "Semi-supervised learning using gaussian fields and harmonic functions", In International Conference on Machine Learning (ICML). P. 912-919.

Quinlan J.R. (1987), "Decision trees as probabilistic classifiers", Proceedings of 4th International Workshop on Machine Learning, Irvine, CA. P. 31-37. CrossRef

Gehrke J., Ramakrishnan R. and Loh W.R. (1999) "BOAT-optimistic decision tree construction", In Proceedings ACM SIGMOD International Conference Management of Data, Philadelphia, PA. P. 169-180. CrossRef

Quinlan J.R. (1993), C4.5: Programs for machine learning, Morgan Kaufmann, San Mateo, CA, 312 р.

Breiman L., Friedman J., Olshen R. and Stone C. (1984), Classification and Regression Trees. Technical report, Wadsworth International, Monterey, CA, 358 p.

Kohavi R. (1995), "The power of decision tables", In: The eighth european conference on machine learning (ECML-95), Heraklion, Greece, P. 174-189. CrossRef

Han J. and Kamber M. (2006), Data mining: concepts and techniques, Morgan Kaufmann, India.

Lyu M.R. (1996) Handbook of Software Reliability Engineering. New York: McGraw-Hill.

Park S., Hoyoung N. and Sugumaran V. (2006) "A Semi automated filtering technique for software process tailoring using neural networks", Expert System and Applications, Vol. 30. P. 179-189. CrossRef

Perlovsky L.I. (2000), Neural Networks and Intellect: Using Model Based Concepts. New York: Oxford University Press.

Rumelhart D.E, Hinto G.E. and Williams R.J. (1986), "Learning internal representations by error propagation", In D.E. Rumelhart and J.L. McClelland (Eds.), Parallel Distributed Processing: Explorations in the Microstructure of Cognition, Cambridge, MA: The MIT Press, Vol. 1. P. 318-362. CrossRef

Moody J. and Darken C.J. (1989), "Fast learning in networks of locally tuned processing units", Neural Computing, Vol. 1, P. 81-294. CrossRef

Specht D.F. (1990), "Probabilistic neural networks", Journal of Neural Networks, Vol. 3. P. 110-118. CrossRef

Elman J. L. (1990), "Finding Structure in Time", Cognitive science, Vol. 14, N 2. P. 179-211. CrossRef

Fahlman S.E. and Lebiere C. (1990), "The cascade-correlation learning architecture", In Advances in Neural Information Processing Systems. San Mated, CA: Morgan Kaufmann. P. 524-532.

Kohonen T. (1997), Self-Organizing Maps. Berlin: Springer-Verlag, 513p. CrossRef

Daelemans W. and Van den Bosch A. (2005), Memory-Based Language Processing. Cambridge University Press. CrossRef

Russell S.P. and Norvig, P. (2003), Artificial Intelligence. A Modern Approach (2nd ed.). New Jersey, USA: Prentice-Hall, 932 p.

Hammond K. J. (1989), Case-Based Planning. Academic Press: New York, 297 p. CrossRef

Kolodner J.L. (1992), "An introduction to Case Based Reasoning", Artificial Intelligence Review, Vol. 6, N 1, P. 3-34. CrossRef

Muggleton S. (1991), "Inductive Logic Programming",New Generation Computing, Vol. 8. P. 295-318.">CrossRef

Quinlan J.R. (1990), "Learning logical definitions from relations", Machine Learning, Vol. 5. P. 239-266. CrossRef

Muggleton S. and Feng C. (1990), "Efficient induction of logic programs", In Proceedings of the First Conference on Algorithmic Learning Theory, Japanese Society for Artificial Intelligence, Tokyo, pp. 368-381.

Muggleton S. (1995), "Inverse Entailment and Progol", New Generation Computing, Vol. 13, pp. 245-286. CrossRef

Vapnik V. (1998), "Statistical learning theory", Adaptive and Learning Systems,Vol. 736.

Hanley J., McNeil B.J. (1982), "The meaning and use of the area under a receiver operating characteristic ROC curve", Radiology, Vol. 143. P. 29-36. CrossRef

Yang B. and Xiang L. (2007), "A study on software reliability prediction based on support vector machines", In: Proceedings of international conference on industrial engineering and engineering management (IEEM'07), pp. 1176-1180.

Phillip S. (2003), "DTReg predictive modeling software", 395р.

Goldberg G.E. (1989), "Genetic Algorithmic Search, Optimization and Machine Learning", Reading, MA: Addition-Wisely, 412p.

Koza J.R. (1992), "Genetic Programming: On the Programming of Computers by Means of Natural Selection", MIT Press, 609 p.

Fenton N.E. and Pfleeger S.L. (1997), Software Metrics, PWS Publishing Company, 2nd ed.

Zhang D. and Tsai J.J.P. (2003), "Machine learning and software engineering", Software Quality Journal, Vol.11, Issue 2, pp.87-119. CrossRef

Evett M., Khoshgoftar T., Chien P. and E. Allen, (1998) "GP-based software quality prediction", Proc. Third Annual Genetic Programming Conference, P. 60-65.

Lanubile F. and Visaggio G., (1997) "Evaluating predictive quality models derived from software measures: lessons learned", Journal of Systems and Software, Vol. 38, P. 225-234. CrossRef

Hong E. and Wu C., (1997), "Criticality models using SDL metrics set", Proc. the 4th Asia-Pacific Software Engineering and International Computer Science Conference, P. 23-30.

Khoshgoftaar T., Pandya A. and Lanning D. (1995), "Application of neural networks for predicting faults", Annals of Software Engineering, Vol. 1. P. 141-154. CrossRef

Khoshgoftaar T.M., Allen E.B., Jones W.D. and Hudepohl J.P. (2000), "Classification -tree models of software quality over multiple releases", IEEE Transactions on Reliability, Vol. 49. N 1. P. 4-11. CrossRef

Kokol P., Podgorelec V. and Pighim M. (2001), "Using software metrics and evolutionary decision trees for software quality control", Available at:

El Emam K., Benlarbi S., Goel N. and Rai S. (2001), "Comparing case-based reasoning classifiers for predicting high risk software components", Journal of Systems and Software, Vol. 55, N 3. P. 301-320. CrossRef

Ganesan K., Khoshgoftaar T. and Allen E. (2000), "Cased-based software quality prediction", International Journal of Software Engineering and Knowledge Engineering, Vol.10 No.2, pp. 139-152. CrossRef

Khoshgoftaar T. and Seliya N. (2003), "Analogy-Based Practical Classification Rules for Software Quality Estimation", Empirical Software Engineering. Vol. 8. N 4. P. 325-350. CrossRef

Khoshgoftaar T., Nguyen L., Gao K. and Rajeevalochanam J. (2003), "Application of an attribute selection method to CBR-based software quality classification", Proceedings of 15th IEEE International Conference on Tools with AI.

Khoshgoftaar T., Cukic B. and Seliya N. (2002), "Predicting fault-prone modules in embedded systems using analogy-based classification models", International Journal of Software Engineering and Knowledge Engineering, Vol.12 N 2. P. 201-221. CrossRef

Porter A. and Selby R.(1990), "Empirically-guided software development using metric-based classification trees", IEEE Software, Vol. 7. P. 46-54. CrossRef

Briand L., Basili V. and Hetmanski C. (1993), "Developing interpretable models with optimized set reduction for identifying high-risk software components", IEEE Trans. SE, Vol. 19. N 11. P. 1028-1043.">CrossRef

Khoshgoftaar T., Allen E.B. and Deng J. (2002), "Using regression trees to classify fault-prone software modules", IEEE Transactions on Reliability, Vol. 51. N 4. P. 455-462. CrossRef

Khoshgoftaar T. and Seliya N. (2002), "Software quality classification modeling using the SPRINT decision tree algorithm", Proceedings of 14th IEEE International Conference on Tools with AI. P. 365-374.

Reformat M., Pedrycz and W. and Pizzi N.J. (2003), "Software quality analysis with the use of computational intelligence", Information and Software Technology, Vol.45 No.7, pp.405-417. CrossRef

Khoshgoftaar T., Liu Y. and Seliya N. (2003), "Genetic programming-based decision trees for software quality classification", Proceedings of 15th IEEE International Conference on Tools with AI.

Cohen W. and Devanbu P. (1997), "A comparative study of inductive logic programming for software fault prediction", Proc. the fourteenth International Conference on Machine Learning.

Dolado J. (2000), "A validation of the component-based method for software size estimation", IEEE Trans. SE, Vol.26 No. 10, pp. 1006-1021. CrossRef

Briand L., Basili V. and Thomas W. (1992), "A pattern recognition approach for software engineering data analysis", IEEE Trans. SE, Vol. 18 No. 11, pp. 931-942. CrossRef

Briand L. et al. (1999), "An assessment and Comparison of common software cost estimation modeling techniques", Proc. International Conference on Software Engineering, pp.313-322. CrossRef

Chulani S., Boehm B. and Steece B. (1999), "Bayesian analysis of empirical software engineering cost models", IEEE Trans. SE, Vol. 25 No. 4, pp. 573-583. CrossRef

Dolado J.J. (2001), "On the problem of the software cost function", Information and Software Technology, Vol.43 No.l, pp.61-72. CrossRef

Shepperd M.and Schofield C. (1997), "Estimating software project effort using analogies", IEEE Trans. SE, Vol. 23 No. 12, pp. 736-743. CrossRef

Vicinanza S., Prietulla M.J. and Mukhopadhyay T. (1990), "Case-based reasoning in software effort estimation", Proc. 11th Intl. Conf. On Information Systems, pp.149-158.

Kirsopp C., Shepperd M. J. and Hart J. (2002), "Search Heuristics, Case-based Reasoning And Software Project Effort Prediction", Proceedings of Genetic and Evolutionary Computation Conference (GECCO), pp. 1367-1374.

Walkerden F. and Jeffrey R. (1999), "An empirical study of analogy-based software effort estimation", Empirical Software Engineering, Vol.4, pp.135-158. CrossRef

Srinivasan K. and Fisher D. (1995), "Machine learning approaches to estimating software development effort", IEEE Trans. SE, Vol. 21 No. 2, pp. 126-137. CrossRef

Heiat A., (2002), "Comparison of artificial neural network and regression models for estimating software development effort", Information and Software Technology, Vol.44 No. 15, pp.911-922. CrossRef

Wittig G. and Finnie G. (1997), "Estimating software development effort with connectionist models", Information and Software Technology, Vol.39, pp.469-476. CrossRef

Shukla K. (2000), "Neuro-genetic prediction of software development effort", Information and Software Technology, Vol.42 No.10, pp.701-713. CrossRef

Lefley M. and Shepperd M. J. (2003), "Using genetic programming to improve software effort estimation based on general data sets", Proceedings of Genetic and Evolutionary Computation Conference (GECCO), pp.2477-2487. CrossRef

Burgess C.J. and Lefley M. (2001), "Can genetic programming improve software effort estimation? a comparative evaluation", Information and Software Technology, Vol.43 No.14, pp.863-873. CrossRef

Finnie G., Wittig G.and Desharnais J-M. (1997), "A comparison of software effort estimation techniques: using function points with neural networks, case-based reasoning and regression models", Journal of Systems and Software, Vol.39 No.3, pp.281-289. CrossRef

Mair C., Kadoda G., Lefley M., Phalp K., Schofield C., Shepperd M. and Webster S. (2000), "An investigation of machine learning based prediction systems", Journal of Systems and Software, Vol.53 No.l, pp.23-29. CrossRef

Jorgensen M. (1995), "Experience with the accuracy of software maintenance task effort prediction models", IEEE Trans. SE, Vol.21 No.8, pp.674-681. CrossRef

Selby R. and Porter A. (1988), "Learning from examples: generation and evaluation of decision trees for software resource analysis," IEEE Trans. SE, Vol. 14, pp.1743-1757. CrossRef

De Almeida M., Lounis H. and Melo W. (1998), "Proc. International Conference on Software Engineering", 1998, pp.473-476.

Mao Y., Sahraoui H. and Lounis H. (1998), "Reusability hypothesis verification using machine learning techniques: a case study", Proc. 13th IEEE International Conference on Automated Software Engineering, 1998, pp.84-93.

Dohi T., Nishio Y. and Osaki S. (1999), "Optimal software release scheduling based on artificial neural networks", Annals of Software Engineering, Vol.8 No.l, pp.167-185. CrossRef

Khoshgoftaar T., Allen E. and Xu Z. (2000), "Predicting testability of program modules using a neural network", Proc. IEEE Symposium on Application-Specific Systems and Software Engineering Technology, pp.57-62. CrossRef

Stamelos I., Angelis L., Dimou P., Sakellaris E. (2003), "On the use of Bayesian belief networks for the prediction of software productivity", Information and Software Technology, Vol.45 No.l, pp.51-60. CrossRef

Wegener J., Sthamer H., Jones B.F. and Eyres D.E. (1997), "Testing real-time systems using genetic algorithms", Software Quality Journal, Vol. 6. P.127-135. CrossRef

Karunanithi N., Whitely D. and Malaiya Y. (1992), Prediction of software reliability using connectionist models. IEEE Trans. SE. Vol. 18, N 7. P. 563-574. CrossRef

Yang B. and Xiang L. (2007) A study on software reliability prediction based on support vector machines. International conference on industrial engineering and engineering management (IEEM). P. 1176-1180.

Xingguo L. and Yanhua S. (2007), An early prediction method of software reliability based on support vector machine. International conference on wireless communications, networking and mobile computing (WiCom). P. 6075-6078.

Kumar P., Singh Y. (2012) An empirical study of software reliability prediction using machine learning techniques. International Journal of System Assurance Engineering and Management (Int J Syst Assur Eng Manag).Vol. 3, N 3. P. 194−208. CrossRef

Fenton N., Neil M. (1999) A critique of software defect prediction models. IEEE Trans. SE. Vol. 25, N 5. P. 675−689. CrossRef

Langley P., Simon H. (1995), "Applications of machine learning and rule induction", Communications of ACM, Vol. 38. N ll. P. 55-64. CrossRef



  • There are currently no refbacks.