Cancer/Smoke/Friends: a classical example for MLNs
This simple DeepDive example is based on a classical Markov Logic Networks example to show probabilistic inference and factor graphs functionalities of DeepDive. You can read more about probabilistic inference and factor graphs in our detailed documentation.
Inference rules
The objectives in this example are to infer whether a person smokes, and whether a person has cancer with a some probability using the factor A => B
.
Here, A => B
reads "A
implies B
", meaning that if A
is true, then B
is also true.
We introduce two rules:
If person A smokes, then A might have a cancer.
If two people A and B are friends and A smokes, then B might also smoke.
These rules can be written in DDlog as follows (in the app.ddlog
file):
person (
person_id bigint,
name text
).
person_has_cancer? (
person_id bigint
).
person_smokes? (
person_id bigint
).
friends (
person_id bigint,
friend_id bigint
).
@weight(0.5)
person_smokes(p) => person_has_cancer(p) :-
person(p, _).
@weight(0.4)
person_smokes(p1) => person_smokes(p2) :-
person(p1, _), person(p2, _), friends(p1, p2).
Setup
Before running the example, please check that DeepDive has been properly installed and the necessary files (app.ddlog
, db.url
, and deepdive.conf) and directories (input/) associated with this example are stored in the current working directory.
Input directory should have the data files (friends.tsv
, person_has_cancer.tsv
, person_smokes.tsv
, and person.tsv
).
In order to use DeepDive, a database instance must be running to accept requests, and the database location must be specified in the db.url
.
You can refer to the tutorial for further detail.
Running
Now you are ready to run the example. First, you have to compile the code using the following command.
deepdive compile
Once it has compiled with no error, you can run the following command to see the list of deepdive targets.
deepdive do
To run the entire pipeline you can run the following command.
deepdive run
This will display a plan for deepdive to run your pipeline.
To start the pipeline, exit the editor with :wq
command.
Results
Once the pipeline has completed running, you can view the results in the database using SQL or DDlog queries. The entire database should look like this:
List of relations
Schema | Name | Type | Owner | Size | Description
--------+------------------------------------------------------+-------+-------+------------+-------------
public | dd_graph_variables_holdout | table | user | 0 bytes |
public | dd_graph_variables_observation | table | user | 0 bytes |
public | dd_graph_weights | view | user | 0 bytes |
public | dd_inference_result_variables | table | user | 8192 bytes |
public | dd_factors_inf_imply_person_smokes_person_has_cancer | table | user | 8192 bytes |
public | dd_factors_inf_imply_person_smokes_person_smokes | table | user | 8192 bytes |
public | dd_weights_inf_imply_person_smokes_person_has_cancer | table | user | 16 kB |
public | dd_weights_inf_imply_person_smokes_person_smokes | table | user | 16 kB |
public | friends | table | user | 8192 bytes |
public | person | table | user | 16 kB |
public | person_has_cancer | table | user | 8192 bytes |
public | person_has_cancer_calibration | view | user | 0 bytes |
public | person_has_cancer_inference | view | user | 0 bytes |
public | person_smokes | table | user | 8192 bytes |
public | person_smokes_calibration | view | user | 0 bytes |
public | person_smokes_inference | view | user | 0 bytes |
(16 rows)
Tables person
, friends
, person_has_cancer
, and person_smokes
hold the input data we prepared under the input/
directory.
To see what DeepDive inferred from our data, you can look at person_smokes_inference
and person_has_cancer_inference
.
The two views should look like the following:
deepdive sql "SELECT * FROM person_smokes_inference"
person_id | dd_id | label | category | expectation
-----------+-------+-------+----------+-------------
4 | 9 | | 1 | 0.643
2 | 7 | | 1 | 0.506
6 | 11 | | 1 | 0.468
5 | 10 | | 1 | 0.451
(4 rows)
deepdive sql "SELECT * FROM person_has_cancer_inference"
person_id | dd_id | label | category | expectation
-----------+-------+-------+----------+-------------
3 | 2 | | 1 | 0.635
1 | 0 | | 1 | 0.614
6 | 5 | | 1 | 0.57
2 | 1 | | 1 | 0.563
4 | 3 | | 1 | 0.563
5 | 4 | | 1 | 0.551
(6 rows)
The dd_id
column is for internal usage and can be ignored by the user and person_id
is the user defined identifier in the input data.
You can see that DeepDive uses the given data and inference rules to predict the probability of the person being a smoker or having cancer in the expectation
column.