Generation of multipurpose formal models from legacy code

S.V. Potiyenko, A.V. Kolchin


In this paper a method for generation of formal models from legacy systems code is proposed. The purpose of these models is to have a possibility of their application in different tasks such as automatic generation of executable tests, translation to modern programming languages, reverse engineering. The method pursues goals to decrease complexity of state space search and checking formulas satisfiability in relation to the direct code modeling, and to help legacy systems understanding and implementing in modern technologies. We focused on formalization of Cobol memory model as it is the most common language in legacy systems. Formal model is an attribute transition system with control flow. We proposed an algorithm for building enumerated types for any variables whose usage fulfills certain conditions, including translation procedure of numeric variables to enumerated ones. We considered a problem of translating non-comparable structures which overlap in memory (operator redefines in Cobol), are copied or compared with each other. In opposite to common approach of using union semantics (like union construction in C++), we described a method of structure fields decomposition which has no drawbacks of unions and makes for minimization of bytewise approach. We considered the developed method on the examples of structures as with simple fields as with arrays. We also gave examples of realization of bytewise approach in Java and C++ languages for those variables that cannot be represented as enumerated or numeric attributes. We tried this work for tests generation for middle-sized projects (up to 100 000 lines of code) where it showed efficiency of developed method, also generated formal models were used for debugging of Cobol to Java translator and business rules extraction.

 Problems in programming 2022; 3-4: 42-50


translation; formal model; legacy systems

Full Text:

PDF (Ukrainian)


Patrick Stanard, A history of COBOL, why it’s so popular today, where to find COBOL talent and the benefits of migrating to v6.3. https:// (2021)

Ceccato, M., Dean, T.R., Tonella, P., Marchignoli, D., Data Model Reverse Engineering in Migrating a Legacy System to Java, Reverse Engineering, 2008. WCRE ‘08. 15th Working Conference on , vol., no., pp.177–186. (2008)

Yohei Ueda, Moriyoshi Ohara. Refactoring of COBOL data models based on similarities of data field name. (2014)

European Telecommunications Standards Institute. TTCN-3: Core Language. ES 201 873-1 4.11.1. (2019)

International Telecommunications Union. Message Sequence Charts Z.120. (2011)

Letichevsky A., Kapitonova J., Kotlyarov V., Volkov V., Letichevsky A. Jr., and Weigert T. Semantics of Message Sequence Charts. Proc. 12th International SDL Forum: Model Driven, LNCS, vol. 3530, pp.117-132. (2005)

Wynne M. and Hellesoy, A. The Cucumber Book. The Pragmatic Bookshelf. (2012)

Holger M. Kienle, Hausi A. Müller, Rigi – An environment for software reverse engineering, exploration, visualization, and redocumentation, Science of Computer Programming, Volume 75, Issue 4, pp. 247-263. (2010)

Hajnal, Ákos & Forgács, István, A demand-driven approach to slicing legacy COBOL systems. Journal of Software Maintenance, 24, pp. 67-82. (2012)

A.Sivagnana Ganesan, T.Chithralekha, M. Rajapandian, A Formal Model for Legacy System Understanding. I.J. Intelligent Systems and Applications, 10, pp. 27-41. (2018)

Arie van Deursen, Leon Moone, Exploring Legacy Systems Using Types. Proceedings Seventh Working Conference on Reverse Engineering. IEEE, pp. 32-41. (2000)

Guba, A., et al.: A method for business logic extraction from legacy COBOL code of industrial systems. In: Proceedings of the 10th International Conference on Programming UkrPROG2016, CEUR-WS, vol. 1631, pp. 17–25 (2016)

Weigert, T., et al.: Generating test suites to validate legacy systems. In: Fonseca i Casas, P., Sancho, M.-R., Sherratt, E. (eds.) SAM 2019. LNCS, vol. 11753, pp. 3–23. Springer, Cham (2019).

Kolchin, A., Potiyenko, S.,Weigert, T.: Challenges for automated, model-based test scenario generation. Comm. Comput. Inf. Sci. 1078, 182–194. (2019)

Edsger W. Dijkstra, Guarded commands, nondeterminacy and formal derivation of programs. Communications of the ACM 18.8 (1975), pp. 453-457. (1975)

Kolchin, A. Interactive method for cumulative analysis of software formal models behavior. Proc. of the 11th Int. Conf. on Programming UkrPROG’2018, CEUR-WS vol. 2139, pp. 115–123. (2018)


  • There are currently no refbacks.