"Sixth Framework Programme"
Search   


Description of Work

To achieve its goal, the APrIL II consortium has identified the following key applications :
Protein folding: Understanding the organisation of protein fold space. Structural genomics is revealing numerous di erent protein folds so today there are more than 600 folds and this number will double over the next few years. Understanding the organisation of the complex arrangement of the component secondary structures and is central to understanding the relationships that result from evolutionary constraints and predicting function from structure. This knowledge is a major component of extracting functional information from protein folds, with its potential medical benefit. The complexity of the inter-relationships requires a robust formal method of learning combining probability and logic rather than ad hoc combinations derived for individual applications. With a robust learning structure, there is major scope for major computationally driven advances for both fundamental and applied research.
Metabolic pathways: As was shown in the assessment project, probabilistic logic learning is needed for properly integrating the current knowledge about the complex systems formed by biochemical processes. The application of these techniques to a real-size problem, like for example the mammalian cell cycle control, would allow one to enrich the existing models, fill their holes w.r.t. some temporal properties, and correct errors or suggest biological experiments.
Haplotype structure for gene mapping: Gene mapping, i.e., discovery of genes predisposing to diseases, is crucial for understanding the genetic background of diseases and for finding good targets for drug development. Recent advances in genetic data measurement techniques, such as the development of dense SNP marker maps, require new techniques in data analysis as well. Understanding the haplotype structure of the human genome is crucial for gene mapping. The goal of this application area is to develop probabilistic logic methods for finding the haplotype structure in human populations, and to develop techniques for comparing the haplotype information against phenotypic data. The results can be immediately applied to gene mapping.
The first two applications were already studied within the assessment project. The APrIL I consortium convincingly demonstrated the need for probabilistic logic learning in these applications. Nevertheless, within the time and resources allocated to an assessment project, it is impossible to develop show-case applications. Producing show case applications of probabilistic logic learning is exactly the goal of the APrIL II project. In addition, to the two applications already present in APrIL I, we also intend to study a third application: gene mapping. Within this application there is also a clear need and opportunity for probabilistic logic learning. The need for probabilistic logic learning follows from the characteristics of gene mapping: (1) uncertainty due to variance among genetic information, (2) relational structure because of pedigrees, and (3) automatically learning the mappings from data due to complexity of the task.

To assess the applicability of probabilistic logic learning, the applications will be considered from a di erent perspective. For the protein folding application, various general purpose probabilistic logic learning techniques will be applied. On the other hand, for the gene mapping and metabolic pathway applications, the idea is to embed (components of) probabilistic logic learning systems in systems and tools that already exist for these applications.

To obtain an adequate understanding of probabilistic logic learning, the APrIL II consortium has identified the following key issues and workpackages:
Probabilistic logic representations and inference methods need to be developed that correspond to di erent classes and types of probabilistic representations, and corresponding inference methods must be developed.
Classes of representations will include sequence based (such as Markov models and their variants), graph based (such as Bayesian networks), discrete data structures (such as trees and k-dimensional grids) and grammar based (such as stochastic context free grammars). With type we refer to the general purpose or embedded nature of the probabilistic logic representation. Special attention will be devoted to incorporating mechanisms for dealing with time, space, action, utilities, and qualitative and quantitative issues, and to deriving appropriate and fast inference algorithms.
Learning Novel learning algorithms need to be developed and existing ones adapted in order to obtain a wide spectrum of machine learning algorithms. Di erent learning methods are needed for dealing with di erent settings. Methods for both structure learning and parameter estimation will be studied and settings such as supervised, unsupervised as well as reinforcement learning will be developed.
Systems The probabilistic logic representations and learning algorithms will be incorporated in various prototypes that should be useful for experimentation and demonstration purposes. Both general purpose and embedded systems will be developed and tested on the applications.
A theory of probabilistic logic learning needs to be developed that addresses the expressivity of alternative probabilistic logic formalisms and primitives, the trade-o between expressive power and computational cost, and the convergence and the complexity of learning (i.e. computational learning theory aspects).
The major deliverables of the APrIL II project will be a book that contains an introduction to the field of probabilitistic logic learning and its applications, and provides an overview of the achievements of the project.