About

The Bayesian Knowledge Discovery Project was a joint effort between The Knowledge Media Institute and the Department of Statistics

Overview

The Information Society used information and generated data. The data was stored in large and fast-growing databases and represented the challenge of exploiting the data to enhance planning, prediction, and decision-making. The Bayesian Knowledge Discovery Project aimed at developing methods and tools, based on sound statistical theories, to take up this challenge.

Results

The contributions could be broadly classified into four main areas:

  • Missing Data
  • Model Search
  • RBE
  • Bayesian Clustering by Dynamics

Missing Data

Bound and Collapse (BC) was a new method to learn conditional probabilities from incomplete databases, developed within the Bayesian Knowledge Discovery project.

Origins: The origins of BC were in a robust method able to learn conditional probabilities from incomplete databases based on probability intervals. The robust method computed the extreme probability distributions consistent with the available information in the database.

Method: BC was defined by two steps: i) Bound the set of estimates consistent with the available information using the robust method mentioned above, and ii) Collapse the resulting set to a point estimate via a convex combination of the extreme points, with weights depending on the assumed pattern of missing data.

Applications: BC was a general method to learn conditional probabilities from incomplete databases. It was implemented in Bayesian Knowledge Discoverer and it was used to:

  • Parameter estimation: Learn the conditional probabilities from incomplete databases.
  • Model Selection: Extract the graphical defining a BBN from incomplete databases.

Model Search

Decision theoretic foundations of Bayesian networks model selection.

RBE

A robust Bayesian estimator for incomplete databases.

Bayesian Clustering by Dynamics

A Bayesian method for clustering Markov processes.

Software

The results of the project were implemented in:

Bayesian Knowledge Discoverer (BKD)

Bayesian Knowledge Discoverer (BKD) was a computer program that was able to learn Bayesian Belief Networks from (possibly incomplete) databases. BKD was based on a new estimation method called Bound and Collapse and was developed within the Bayesian Knowledge Discovery project. BKD was distributed in over 3,000 copies around the world.

BKD had a commercial successor, Bayesware Discoverer, distributed by Bayesware Limited.

Capabilities of the BKD Version 1.0 were:

  • Estimation: Estimation of conditional probability distributions.
  • Model Selection: Bayesian learning of the graphical structure.
  • Propagation: Provide a goal oriented propagation algorithm.
  • Discretisation: Unsupervised discretisation continuos variables.
  • Missing Data: Handle missing data using Bound and Collapse.
  • Automated Definition: Automated variable definition from data.
  • BNIF Interface: Import/export for BN Interchange Format.
  • User Interface: Interactive GUI and search process animation.
  • Documentation:  Movie-based on-line help.

Platforms: BKD Version 1.0 (Beta) for MS Windows 9x/NT (8.9 MB).

Robust Bayesian Classifier (RoC)

The Robust Bayesian Classifier (RoC) was a computer program able to perform supervised Bayesian classification from incomplete databases, with no assumption about the pattern of missing data. RoC was based on a new estimation method called Robust Bayesian Estimator and it was developed within the Bayesian Knowledge Discovery project.

Capabilities of RoC Version 1.0 were.

  • Training: Estimate  conditional probability distributions.
  • Testing:  Predict class label of cases in a database.
  • Discretisation: Continuous variables are discretised.
  • Missing Data: Handle incomplete cases.
  • Cross validation: A cross validation utility for evaluation.
  • Automatic Definition: Attributes are automatically defined from data.
  • User Interface: Easy-to-use wizard interface.
  • Documentation: On screen context sensitive help.

Platforms: RoC Version 1.0 (Beta) for Microsoft Windows 9x/NT (4.9 MB)

People

Marco Ramoni (Knowledge Media Institute)

Paola Sebastiani (Department of Statistics)

Publications




  • M. Ramoni and P. Sebastiani, Learning Bayesian Networks from Incomplete DatabasesTechnical Report KMi-TR-43, Knowledge Media Institute, The Open University, February 1997. Also in Proceedings of the Thirteenth Conference on Uncertainty in Artificial Intelligence, Morgan Kaufman, San Mateo, CA, 1997





  • P. Sebastiani and M. Ramoni, Decision Theoretic Foundations of Graphical Model Selection, Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence (UAI98), Madison, WI, 1998.

  • P. Sebastiani and M. Ramoni, Induction of Graphical Models from Incomplete Samples, Proceedings of the XIII Conference of the International Association for Statistical Computing (COMPSTAT-98), Bristol, United Kingdom, 1998.

  • M. Ramoni and P. Sebastiani, Learning Conditional Probabilities from Incomplete Data: An Experimental Comparison, in Proceedings of the Seventh International Workshop on Artificial Intelligence and Statistics, Morgan Kaufman, San Mateo, CA, 1999.

  • P. Sebastiani and M. Ramoni, Model Folding for Data Subject to Nonresponse, in Proceedings of the Seventh International Workshop on Artificial Intelligence and Statistics, Morgan Kaufman, San Mateo, CA, 1999.



  • M. Ramoni,  P. Sebastiani, P. Cohen, J. Warwick and J. Davis, Bayesian Clustering by Dynamics, KMi Technical Report KMi-TR-78, Knowledge Media Institute, The Open University, February 1999.