# Probabilistic inference and factor graphs

This documents presents a high-level overview of **probabilistic inference** and
an introduction to **factor graphs**, a model used by DeepDive to perform
probabilistic inference.

**Probabilistic inference** is the task of deriving the probability of one or
more random variables taking a specific value or set of values. For example, a
Bernoulli (Boolean) random variable may describe the event that John has cancer.
Such a variable could take a value of 1 (John has cancer) or 0 (John does not
have cancer). DeepDive uses probabilistic inference to estimate the probability
that the random variable takes value 1: a probability of 0.78 would mean that
John is 78% likely to have cancer.

## Factor graphs

A **factor graph** is a type of probabilistic graphical model. A factor graph
has two types of nodes:

**Variables**, which can be either*evidence variables*when their value is known, or*query variables*when their value should be predicted.**Factors**, which define the relationships between variables in the graph. Each factor can be connected to many variables and comes with a**factor function**to define the relationship between these variables. For example, if a factor node is connected to two variables nodes`A`

and`B`

, a possible factor function could be`imply(A,B)`

, meaning that if the random variable`A`

takes value`1`

, then so must the random variable`B`

. Each*factor function*has a*weight*associated with it, which describes how much influence the factor has on its variables in relative terms. In other words, the weight encodes the confidence we have in the relationship expressed by the factor function. If the weight is high and positive, we are very confident in the function that the factor encodes; if the weight is high and negative, we are confident that the function is incorrect. The weight can be learned from training data, or assigned manually.

A **possible world** is an assignment to every variable in a factor graph. The
possible worlds are not usually equiprobable; rather, each possible world has
a different probability. The probability of a possible world is proportional to
a weighted combination of all factor functions in the graph, evaluated at the
assignments specified by the possible world. The weights can be assigned
statically or learned automatically. In the latter case, some *training data*
is needed. Training data define a set of possible worlds and, intuitively, the
learning process chooses the weights by maximizing the probabilities of these
possible worlds.

**Marginal inference** is the task of inferring the probability of one variable
taking a particular value. Using the law of total
probability, it is
straightforward to express this probability as the sum of the probabilities of
possible worlds that contain the requested value for that variable.

Exact inference is an intractable problem on factor graphs, but a commonly used
method in this domain is **Gibbs sampling**. The process starts from a random
possible world and iterates over each variable `v`

, updating its value
by taking into account the factor functions of the factors that `v`

is connected
to and the values of the variables connected to those factors (this is known as
the *Markov blanket* of `v`

). After enough iterations
over the random variables, we can compute the number of iterations during which
each variable had a specific value and use the ratio between this quantity and
the total number of iterations as an estimate of the probability of the variable
taking that value.

## Inference in DeepDive

DeepDive allows the user to write inference
rules to specify how to create the factor
graph. A rule expresses concepts like "If John smokes then he is likely to
have cancer" and, in other words, describes the factor function of a factor and
which variables are connected to this factor. Each rule has a *weight* (either
computed by DeepDive or assigned by the user), which represents the confidence
in the correctness of the rule. If a rule has a high positive *weight*, then the
variables appearing in the rule are likely to take on values that would make the
rule evaluate to true. In the above example, if the rule "If John smokes then he
is likely to have cancer" has a high weight and we are **sure** that John
smokes, then we are also reasonably confident that John has cancer. However, if
we are not sure whether or not John smokes, then we can not be sure about him
having cancer either. In the latter case, both "John does not have cancer," and
"John has cancer" would make the rule evaluate to true.

This is a subtle but very important point. Contrary to many traditional machine
learning algorithms, which often assume that prior knowledge is exact and make
predictions in isolation, DeepDive performs *joint inference*: it determines the
values of all events at the same time. This allows events to influence each
other if they are (directly or indirectly) connected through inference rules.
Thus, the uncertainty of one event (John smoking) may influence the uncertainty
of another event (John having cancer). As the relationships among events become
more complex this model becomes very powerful. For example,
one could imagine the event "John smokes" being influenced by whether or not
John has friends who smoke. This is particularly useful when dealing with
inherently noisy signals, such as human language.

## Additional resources

We refer the reader interested in additional details to other good resources:

- A Simple Factor Graph Tutorial
- An Introduction to Conditional Random Fields for Relational Learning
- PGM class on Coursera
- Graphical Models Lecture at CMU
- Towards High-Throughput Gibbs Sampling at Scale: A Study across Storage Managers
- Factor Graphs and the Sum-Product Algorithm
- Scalable Probabilistic Databases with Factor Graphs and MCMC