Application of machine learning in software engineering: an overview
Abstract
Today, software is one of the main technologies contributing to the development of society. Therefore, its quality is a major requirement for both the global software industry and software engineering, which deals with all aspects of improving the quality and reliability of software products at all stages of their life cycle. To solve software engineering problems, the use of artificial intelligence methods is becoming increasingly relevant. The article presents a brief description of machine learning methods such as artificial neural networks, support vector machine, decision trees, inductive logic programming and others. Also, examples of the application of these methods to solve some problems of forecasting and quality assessment in software engineering are presented, recommendations for applying machine learning algorithms to solving problems of software engineering are given. The review will be useful by researchers and practitioners as a starting point, because it identifies important and promising areas of research. This will ultimately lead to more effective solving of software engineering problems, providing better, more reliable and cost effective software products.
Problems in programming 2019; 4: 92-110
Keywords
Full Text:
PDF (Русский)References
Brooks F. (1987), "No silver bullet: essence and accidents of software engineering", IEEE Computer, Vol. 20 No.4, pp.10-19. CrossRef
Andon P.I., Koval G.I., Korotune T.M., Lavrischeva E.M. and Suslov V.Yu. (2007), The Fundamentals for Software Quality Engineering, 2-nd ed., К.: Akademperiodika, 672 p. (in Russian).
Lowry M. (1992), "Software engineering in the twenty first century", AI Magazine, Vol.14 No.3, pp.71-87.
Mostow J. (1985), "Special issue on artificial intelligence and software engineering", IEEE Trans. SE, Vol.11 No.ll, pp. 1253─1408. CrossRef
Partridge D. (1998), Artificial Intelligence and Software Engineering, AMACOM, 277 p.
Rich C. and Waters R.(1986), "Readings in Artificial Intelligence and Software Engineering", Morgan Kaufmann, 589 p.
Tsai J.J.P. and Weigert T. (1993), Knowledge-Based Software Development for Real-Time Distributed Systems, World Scientific Inc., Singapore, 236 p. CrossRef
Mitchell T. (1997), Machine Learning, McGraw-Hill, 414p.
Selfridge O. (1993), "The gardens of learning: a vision for AI", AI Magazine, Vol.14, N 2. P.36-48.
Quinlan J.R. (1999), "Some elements of machine learning", Proceedings of the 9th International Workshop on Inductive Logic Programming, Lecture Notes in Artificial Intelligence, Springer-Verlag, Vol. 1634. P. 15-18. CrossRef
Dietterich T. G. (1997), "Machine learning research: four current directions", AI Magazine. Vol. 18, N 4. P. 97-136.
Seeger M. (2001). Learning with labeled and unlabeled data (Technical Report). University of Edinburgh.
Zhu X., Ghahramani Z., and Lafferty J.D. (2003), "Semi-supervised learning using gaussian fields and harmonic functions", In International Conference on Machine Learning (ICML). P. 912-919.
Quinlan J.R. (1987), "Decision trees as probabilistic classifiers", Proceedings of 4th International Workshop on Machine Learning, Irvine, CA. P. 31-37. CrossRef
Gehrke J., Ramakrishnan R. and Loh W.R. (1999) "BOAT-optimistic decision tree construction", In Proceedings ACM SIGMOD International Conference Management of Data, Philadelphia, PA. P. 169-180. CrossRef
Quinlan J.R. (1993), C4.5: Programs for machine learning, Morgan Kaufmann, San Mateo, CA, 312 р.
Breiman L., Friedman J., Olshen R. and Stone C. (1984), Classification and Regression Trees. Technical report, Wadsworth International, Monterey, CA, 358 p.
Kohavi R. (1995), "The power of decision tables", In: The eighth european conference on machine learning (ECML-95), Heraklion, Greece, P. 174-189. CrossRef
Han J. and Kamber M. (2006), Data mining: concepts and techniques, Morgan Kaufmann, India.
Lyu M.R. (1996) Handbook of Software Reliability Engineering. New York: McGraw-Hill.
Park S., Hoyoung N. and Sugumaran V. (2006) "A Semi automated filtering technique for software process tailoring using neural networks", Expert System and Applications, Vol. 30. P. 179-189. CrossRef
Perlovsky L.I. (2000), Neural Networks and Intellect: Using Model Based Concepts. New York: Oxford University Press.
Rumelhart D.E, Hinto G.E. and Williams R.J. (1986), "Learning internal representations by error propagation", In D.E. Rumelhart and J.L. McClelland (Eds.), Parallel Distributed Processing: Explorations in the Microstructure of Cognition, Cambridge, MA: The MIT Press, Vol. 1. P. 318-362. CrossRef
Moody J. and Darken C.J. (1989), "Fast learning in networks of locally tuned processing units", Neural Computing, Vol. 1, P. 81-294. CrossRef
Specht D.F. (1990), "Probabilistic neural networks", Journal of Neural Networks, Vol. 3. P. 110-118. CrossRef
Elman J. L. (1990), "Finding Structure in Time", Cognitive science, Vol. 14, N 2. P. 179-211. CrossRef
Fahlman S.E. and Lebiere C. (1990), "The cascade-correlation learning architecture", In Advances in Neural Information Processing Systems. San Mated, CA: Morgan Kaufmann. P. 524-532.
Kohonen T. (1997), Self-Organizing Maps. Berlin: Springer-Verlag, 513p. CrossRef
Daelemans W. and Van den Bosch A. (2005), Memory-Based Language Processing. Cambridge University Press. CrossRef
Russell S.P. and Norvig, P. (2003), Artificial Intelligence. A Modern Approach (2nd ed.). New Jersey, USA: Prentice-Hall, 932 p.
Hammond K. J. (1989), Case-Based Planning. Academic Press: New York, 297 p. CrossRef
Kolodner J.L. (1992), "An introduction to Case Based Reasoning", Artificial Intelligence Review, Vol. 6, N 1, P. 3-34. CrossRef
Muggleton S. (1991), "Inductive Logic Programming",New Generation Computing, Vol. 8. P. 295-318.
https://doi.org/10.1007/BF03037089">CrossRef
Quinlan J.R. (1990), "Learning logical definitions from relations", Machine Learning, Vol. 5. P. 239-266. CrossRef
Muggleton S. and Feng C. (1990), "Efficient induction of logic programs", In Proceedings of the First Conference on Algorithmic Learning Theory, Japanese Society for Artificial Intelligence, Tokyo, pp. 368-381.
Muggleton S. (1995), "Inverse Entailment and Progol", New Generation Computing, Vol. 13, pp. 245-286. CrossRef
Vapnik V. (1998), "Statistical learning theory", Adaptive and Learning Systems,Vol. 736.
Hanley J., McNeil B.J. (1982), "The meaning and use of the area under a receiver operating characteristic ROC curve", Radiology, Vol. 143. P. 29-36. CrossRef
Yang B. and Xiang L. (2007), "A study on software reliability prediction based on support vector machines", In: Proceedings of international conference on industrial engineering and engineering management (IEEM'07), pp. 1176-1180.
Phillip S. (2003), "DTReg predictive modeling software", 395р. http://www.dtreg.com.
Goldberg G.E. (1989), "Genetic Algorithmic Search, Optimization and Machine Learning", Reading, MA: Addition-Wisely, 412p.
Koza J.R. (1992), "Genetic Programming: On the Programming of Computers by Means of Natural Selection", MIT Press, 609 p.
Fenton N.E. and Pfleeger S.L. (1997), Software Metrics, PWS Publishing Company, 2nd ed.
Zhang D. and Tsai J.J.P. (2003), "Machine learning and software engineering", Software Quality Journal, Vol.11, Issue 2, pp.87-119. CrossRef
Evett M., Khoshgoftar T., Chien P. and E. Allen, (1998) "GP-based software quality prediction", Proc. Third Annual Genetic Programming Conference, P. 60-65.
Lanubile F. and Visaggio G., (1997) "Evaluating predictive quality models derived from software measures: lessons learned", Journal of Systems and Software, Vol. 38, P. 225-234. CrossRef
Hong E. and Wu C., (1997), "Criticality models using SDL metrics set", Proc. the 4th Asia-Pacific Software Engineering and International Computer Science Conference, P. 23-30.
Khoshgoftaar T., Pandya A. and Lanning D. (1995), "Application of neural networks for predicting faults", Annals of Software Engineering, Vol. 1. P. 141-154. CrossRef
Khoshgoftaar T.M., Allen E.B., Jones W.D. and Hudepohl J.P. (2000), "Classification -tree models of software quality over multiple releases", IEEE Transactions on Reliability, Vol. 49. N 1. P. 4-11. CrossRef
Kokol P., Podgorelec V. and Pighim M. (2001), "Using software metrics and evolutionary decision trees for software quality control", Available at: http://www.escom.co.uk/conference2001/papers/kokol.pdf.
El Emam K., Benlarbi S., Goel N. and Rai S. (2001), "Comparing case-based reasoning classifiers for predicting high risk software components", Journal of Systems and Software, Vol. 55, N 3. P. 301-320. CrossRef
Ganesan K., Khoshgoftaar T. and Allen E. (2000), "Cased-based software quality prediction", International Journal of Software Engineering and Knowledge Engineering, Vol.10 No.2, pp. 139-152. CrossRef
Khoshgoftaar T. and Seliya N. (2003), "Analogy-Based Practical Classification Rules for Software Quality Estimation", Empirical Software Engineering. Vol. 8. N 4. P. 325-350. CrossRef
Khoshgoftaar T., Nguyen L., Gao K. and Rajeevalochanam J. (2003), "Application of an attribute selection method to CBR-based software quality classification", Proceedings of 15th IEEE International Conference on Tools with AI.
Khoshgoftaar T., Cukic B. and Seliya N. (2002), "Predicting fault-prone modules in embedded systems using analogy-based classification models", International Journal of Software Engineering and Knowledge Engineering, Vol.12 N 2. P. 201-221. CrossRef
Porter A. and Selby R.(1990), "Empirically-guided software development using metric-based classification trees", IEEE Software, Vol. 7. P. 46-54. CrossRef
Briand L., Basili V. and Hetmanski C. (1993), "Developing interpretable models with optimized set reduction for identifying high-risk software components", IEEE Trans. SE, Vol. 19. N 11. P. 1028-1043.
https://doi.org/10.1109/32.256851">CrossRef
Khoshgoftaar T., Allen E.B. and Deng J. (2002), "Using regression trees to classify fault-prone software modules", IEEE Transactions on Reliability, Vol. 51. N 4. P. 455-462. CrossRef
Khoshgoftaar T. and Seliya N. (2002), "Software quality classification modeling using the SPRINT decision tree algorithm", Proceedings of 14th IEEE International Conference on Tools with AI. P. 365-374.
Reformat M., Pedrycz and W. and Pizzi N.J. (2003), "Software quality analysis with the use of computational intelligence", Information and Software Technology, Vol.45 No.7, pp.405-417. CrossRef
Khoshgoftaar T., Liu Y. and Seliya N. (2003), "Genetic programming-based decision trees for software quality classification", Proceedings of 15th IEEE International Conference on Tools with AI.
Cohen W. and Devanbu P. (1997), "A comparative study of inductive logic programming for software fault prediction", Proc. the fourteenth International Conference on Machine Learning.
Dolado J. (2000), "A validation of the component-based method for software size estimation", IEEE Trans. SE, Vol.26 No. 10, pp. 1006-1021. CrossRef
Briand L., Basili V. and Thomas W. (1992), "A pattern recognition approach for software engineering data analysis", IEEE Trans. SE, Vol. 18 No. 11, pp. 931-942. CrossRef
Briand L. et al. (1999), "An assessment and Comparison of common software cost estimation modeling techniques", Proc. International Conference on Software Engineering, pp.313-322. CrossRef
Chulani S., Boehm B. and Steece B. (1999), "Bayesian analysis of empirical software engineering cost models", IEEE Trans. SE, Vol. 25 No. 4, pp. 573-583. CrossRef
Dolado J.J. (2001), "On the problem of the software cost function", Information and Software Technology, Vol.43 No.l, pp.61-72. CrossRef
Shepperd M.and Schofield C. (1997), "Estimating software project effort using analogies", IEEE Trans. SE, Vol. 23 No. 12, pp. 736-743. CrossRef
Vicinanza S., Prietulla M.J. and Mukhopadhyay T. (1990), "Case-based reasoning in software effort estimation", Proc. 11th Intl. Conf. On Information Systems, pp.149-158.
Kirsopp C., Shepperd M. J. and Hart J. (2002), "Search Heuristics, Case-based Reasoning And Software Project Effort Prediction", Proceedings of Genetic and Evolutionary Computation Conference (GECCO), pp. 1367-1374.
Walkerden F. and Jeffrey R. (1999), "An empirical study of analogy-based software effort estimation", Empirical Software Engineering, Vol.4, pp.135-158. CrossRef
Srinivasan K. and Fisher D. (1995), "Machine learning approaches to estimating software development effort", IEEE Trans. SE, Vol. 21 No. 2, pp. 126-137. CrossRef
Heiat A., (2002), "Comparison of artificial neural network and regression models for estimating software development effort", Information and Software Technology, Vol.44 No. 15, pp.911-922. CrossRef
Wittig G. and Finnie G. (1997), "Estimating software development effort with connectionist models", Information and Software Technology, Vol.39, pp.469-476. CrossRef
Shukla K. (2000), "Neuro-genetic prediction of software development effort", Information and Software Technology, Vol.42 No.10, pp.701-713. CrossRef
Lefley M. and Shepperd M. J. (2003), "Using genetic programming to improve software effort estimation based on general data sets", Proceedings of Genetic and Evolutionary Computation Conference (GECCO), pp.2477-2487. CrossRef
Burgess C.J. and Lefley M. (2001), "Can genetic programming improve software effort estimation? a comparative evaluation", Information and Software Technology, Vol.43 No.14, pp.863-873. CrossRef
Finnie G., Wittig G.and Desharnais J-M. (1997), "A comparison of software effort estimation techniques: using function points with neural networks, case-based reasoning and regression models", Journal of Systems and Software, Vol.39 No.3, pp.281-289. CrossRef
Mair C., Kadoda G., Lefley M., Phalp K., Schofield C., Shepperd M. and Webster S. (2000), "An investigation of machine learning based prediction systems", Journal of Systems and Software, Vol.53 No.l, pp.23-29. CrossRef
Jorgensen M. (1995), "Experience with the accuracy of software maintenance task effort prediction models", IEEE Trans. SE, Vol.21 No.8, pp.674-681. CrossRef
Selby R. and Porter A. (1988), "Learning from examples: generation and evaluation of decision trees for software resource analysis," IEEE Trans. SE, Vol. 14, pp.1743-1757. CrossRef
De Almeida M., Lounis H. and Melo W. (1998), "Proc. International Conference on Software Engineering", 1998, pp.473-476.
Mao Y., Sahraoui H. and Lounis H. (1998), "Reusability hypothesis verification using machine learning techniques: a case study", Proc. 13th IEEE International Conference on Automated Software Engineering, 1998, pp.84-93.
Dohi T., Nishio Y. and Osaki S. (1999), "Optimal software release scheduling based on artificial neural networks", Annals of Software Engineering, Vol.8 No.l, pp.167-185. CrossRef
Khoshgoftaar T., Allen E. and Xu Z. (2000), "Predicting testability of program modules using a neural network", Proc. IEEE Symposium on Application-Specific Systems and Software Engineering Technology, pp.57-62. CrossRef
Stamelos I., Angelis L., Dimou P., Sakellaris E. (2003), "On the use of Bayesian belief networks for the prediction of software productivity", Information and Software Technology, Vol.45 No.l, pp.51-60. CrossRef
Wegener J., Sthamer H., Jones B.F. and Eyres D.E. (1997), "Testing real-time systems using genetic algorithms", Software Quality Journal, Vol. 6. P.127-135. CrossRef
Karunanithi N., Whitely D. and Malaiya Y. (1992), Prediction of software reliability using connectionist models. IEEE Trans. SE. Vol. 18, N 7. P. 563-574. CrossRef
Yang B. and Xiang L. (2007) A study on software reliability prediction based on support vector machines. International conference on industrial engineering and engineering management (IEEM). P. 1176-1180.
Xingguo L. and Yanhua S. (2007), An early prediction method of software reliability based on support vector machine. International conference on wireless communications, networking and mobile computing (WiCom). P. 6075-6078.
Kumar P., Singh Y. (2012) An empirical study of software reliability prediction using machine learning techniques. International Journal of System Assurance Engineering and Management (Int J Syst Assur Eng Manag).Vol. 3, N 3. P. 194−208. CrossRef
Fenton N., Neil M. (1999) A critique of software defect prediction models. IEEE Trans. SE. Vol. 25, N 5. P. 675−689. CrossRef
Langley P., Simon H. (1995), "Applications of machine learning and rule induction", Communications of ACM, Vol. 38. N ll. P. 55-64. CrossRef
DOI: https://doi.org/10.15407/pp2019.04.092
Refbacks
- There are currently no refbacks.