The methodology underlying APrIL II is based on the assumption that
-- in order to understand the problem of probabilistic logic
learning and its applicability -- various representations as well
as applications have to be studied. Various probabilistic
representations (and corresponding inference and learning engines)
are needed to cope with different types of problems. This situation
is akin to that in traditional approaches to probabilistic
representations and learning. Indeed, Bayesian networks and
stochastic context free grammars are just two examples of quite
different (even complementary) representations. Furthermore,
expressivity usually has to be balanced with e ciency. Also, within
the field of machine learning, different settings such as
supervised learning and unsupervised require different techniques.
When time and action are important, Markov decision processes and
reinforcement learning come into play. Nevertheless, despite the
different settings, representations and algorithms many of the
underlying principles remain the same such as maximum likelihood,
Bayesian approaches, minimal description length,
Expectation-Maximization (EM), gradients, MCMC, etc. The APrIL II
project aims at
identifying the underlying principles of probabilistic logic
learning through the investigation of different settings and
representations for probabilistic logic learning.
This is also the underlying motivation for the workpackages WP 1
(Representation) and WP 2 (Learning).
A second methodological guideline comes from the application
perspective. As it is our goal to obtain an appreciation of the
applicability of probabilistic logic learning, APrIL II will
develop different types and classes of probabilistic logic
learning systems and apply them in a variety of different
applications.
The application domains, that have been selected, all require the
need for probabilistic logic learning, but are still quite
different in the underlying requirements they impose on
probabilistic logic learning. Indeed, the metabolic pathways can be
modelled in a kind of graph structure (related to Bayesian
networks). On the other hand, proteins and genetic information
possess a sequential nature, which may be more suited for modelling
with approaches based on (hidden) Markov models or grammars. In
addition, two different types of probabilistic logic learning
systems will be applied. For the protein folding domain, general
purpose probabilistic logic learning methods will be applied,
whereas for the metabolic pathways and haplotype applications,
probabilistic logic learning components will be embedded into
methods and systems that already exist for these applications.
Furthermore, different classes of probabilistic logic learning,
such as sequence, graph, discrete structure and grammar based will
be considered. This motivates the workpackages WP 3 (Systems) and
WP 4 (Applications).
Despite the fact that the APrIL II project will investigate
different settings, types and classes for probabilistic logic
learning, it should be pointed out that the resulting
representations and algorithms are strongly connected to one
another. This situation is akin to propositional probabilistic
logic learning, where coherent principles, formalisms and
algorithms have been developed. Indeed, embedded systems to be
developed within APrIL II will contain some of the core components
of the general purpose ones. Furthermore, experiences with the
embedded components should provide valuable feedback for the
general purpose level. Furthermore, also the different classes of
probabilistic logic learning form a coherent whole (cf. Section
7.7, WP 1 for more details on the relationship).
Finally, the insights obtained at all levels concerning
probabilistic logic learning should allow us to identify a core
theory of probabilistic logic learning (WP 5). This theory should
make abstraction of specific representations, learning approaches
and settings as much as possible, and its should serve as the basis
for further developments in the area.
In addition, there are the usual workpackages concerned with
Dissemination (WP 6) and Management (WP 7).
APrIL II will have three milestones, one milestone at the end of
each year for the probabilistic logic learning techniques as well
as for each of the applications:
Milestone A:
- Problem formulation and data collection for each of the
applications.
- New and missing components of probabilistic logic
representations and inference methods have been identified and
designed.
- Components of an initial theory of probabilistic logic
learning are formulated.
Milestone B:
- Experiments with real data and prototype probabilistic
logic learning systems are running.
- New and missing components of probabilistic logic learning
algorithms have been identified and designed.
- (Possibly embedded) prototypes of probabilistic logic
learning system have been implemented.
- A refined theory of probabilistic logic learning is
formulated.
Milestone C:
- The applications have been turned into show-cases for
probabilistic logic learning.
- The systems are ready for use on other applications as
well.
- An integrated theory of probabilistic logic learning is
available.
- Final APrIL II Report.
Two key deliverables of the APrIL II project include 1) the APrIL
book (D20) which will provide an overview of the field of
probabilisitic logic learning (based on overviews of the different
workpackages) and 2) the APrIL repository (D19) which will contain
publications, software and data sets on probabilisitic logic
learning and its applications.