Knowledge discovery in data and causal models in analytical informatics

O.S. Balabanov

Abstract


The methodology of inductive inference of causal models is briefly overviewed. We argue that causal networks, being recovered from data, are able to describe adequately a structure of influences in environment (object) at hand. It’s a causal model that is required when predicting the effect of intervention in object. We outlined the preconditions and requirements on data collection process in aiming to reach an adequate causal network. A multivariate statistical data sample (measured under unified scheme) is needed in the input of inference method. We consider an independence-based approach to causal inference. Methods of this approach are correct, and can perform well in presence of hidden variables. The method’s output usually contains some edges not exactly oriented. Uncertainty of such kind is predetermined by problem setting and allows retaining model adequacy. We suggest a way to enforce an inference algorithm due to set of resolutions which reduce a space for searching separating sets (so focusing a process of edge verification). The modification proposed is based on systematic utilization of concept of locally–minimal separating set and Markov properties. An efficiency of developed algorithms (‘Razor’ series) is demonstrated by control experiments and case study. A distinction between a prediction of causal effect (i.e. effect of active experiment) and traditional prediction in data analysis is illuminated. Some problems of parameter estimation are presented. Some opportunities to predict causal effect when model is incompletely identified are illustrated. We point out a few ideas and new research trends which can enrich analyst’s ability to verify or identify a model.

 Problems in programming 2017; 3: 96-112


Keywords


causal network; model inference from data; Markov properties; conditional independence; structure of dependencies; causal effect; edge orientation; d-separation

References


Pearl J. (2000). Causality: models, reasoning, and inference. Cambridge: Cambridge Univ. Press. 526 p.

Spirtes P., Glymour C., Scheines R. (2001). Causation, prediction and search. New York: MIT Press. 543p. CrossRef

Spirtes P., Zhang K. (2016). Causal discovery and inference: concepts and recent methodological advances // Applied Informatics. 3, (3). 28 p. CrossRef

Andon P.I., and Balabanov O.S. (2000). Vyjavlenie znanij i izyskanija v bazah dannyh. Podhody, modeli, metjdy i sistemy. Problems in programming. 2000, (1-2). P. 513-526. [In Russian].

Andon P.I., and Balabanov O.S. (2006). Structured statistical models: a tool for cognition and modeling. System Research and Information Technologies. 2006, (1). P. 79-98. [In Russian].

Balabanov O.S. (2016). Vidtvorennya kauzalnych merezh na osnovi analizu markovskich vlastyvostej [Reconstruction of causal networks via analysis of Markov properties]. Mathematical Machines and Systems. (2016). (1). P. 16-26. [In Ukrainian].

Balabanov O.S. (2014). 'Causal nets: analysis, synthesis and inference from statistical data', Doctor of math. sciences thesis, V.M. Glushkov Institute of Cybernetics, Kyiv, Ukraine. [In Ukrainian].

Balabanov O.S. (2013). On perspectives of causal networks reconstruction by independence-based methods. Proc. of 4th Intern. Conf. on Inductive Modelling (ICIM'2013). Kyiv, September 16-20. Kyiv, Ukraine. P. 139-142.

Balabanov O.S. (2011). From covariation to causation. Discovery of structures of dependency in data. System Research and Information Technologies. (2011). (4). P. 104-118. [In Ukrainian].

Balabanov O.S. (2013). Logic of minimal separation in causal networks. Cybernetics and Systems Analysis. 49. (2). P. 191-200. CrossRef

Balabanov O.S. (2007). Rules for picking up separators in Bayesian networks. Problems in programming. (4). P. 33-43. [In Ukrainian].

Balabanov A.S. (2008). Minimal separators in dependency structures: Properties and identification. Cybernetics and Systems Analysis. 44. (6). P. 803-815. CrossRef

Fast algorithm for learning the Bayesian networks from data / A.S. Balabanov, A.S. Gapyeyev, A.M. Gupal, S.S. Rzhepetskiy. J. Automation and Information Sciences. (2011). 43. (10). P. 1-9. CrossRef

Balabanov O.S. (2016). Induced dependence, factor interaction, and discriminating between causal structures. Cybernetics and Systems Analysis. 52 (1). P. 8-19. CrossRef

Balabanov A.S. (2009). Construction of minimal d-separators in a dependency system. Cybernetics and Systems Analysis. 45. (5). P. 703-713. CrossRef

Balabanov O.S. (2011). Accelerating algorithms for Bayesian networks recovery. Adaptation to structures without cycles. Problems in programming. (1). P. 63-69. [In Ukrainian].

Bessler D.A. (2003). On world poverty: its causes and effects. Food and Agricultural Organization (FAO) of the United Nations. Research Bulletin. Rome, 2003. 50 p.

Balabanov O.S. (2001). Inductive recovery of structures of dependency trees. Problems in programming. (2001). (1-2). P. 95-108. [In Ukrainian].

Andon P.I., and Balabanov O.S. (2008). On revealing a latent binary factor in categorical data. Reports of Nat. Acad. of Sciences of Ukraine. (9). P. 37-43.

Balabanov O.S. (2016). On the intrinsic relations of correlations in some systems of linear structural equations. Dopov. Nac. akad. nauk Ukr. [Reports of Nat. Acad. of Sciences of Ukraine]. (12). P. 17-21. CrossRef




DOI: https://doi.org/10.15407/pp2017.03.096

Refbacks

  • There are currently no refbacks.