DeepDive Quick Start

DeepDive helps you extract structured knowledge from less-structured data with statistical inference without having to write any sophisticated machine learning code. Here we show how you can quickly install and run your first DeepDive application.

Installing DeepDive

First, you can quickly install DeepDive by running the following command and selecting the deepdive option:

bash <(curl -fsSL
### DeepDive installer for Mac
+ curl -fsSL
1) deepdive                 5) postgres
2) deepdive_examples_tests  6) run_deepdive_tests
3) deepdive_from_release    7) spouse_example
4) deepdive_from_source
# Select what to install (enter for all options, q to quit, or a number)? 1

You need to have a database instance to run any DeepDive application. You can select postgres from DeepDive's installer to install it and spin up an instance on you machine, or just run the following command:

bash <(curl -fsSL postgres

Alternatively, if you have access to a database server, you can configure how to access it as a URL in the application's db.url file.

Running your first DeepDive app

Now, let's see what DeepDive can do for us. We grab a copy of the spouse example app explained in the tutorial. This app extracts mentions of spouses from a corpus of news articles.

bash <(curl -fsSL spouse_example

This will download a copy of the example app's code and data from GitHub to a folder whose name begins with spouse_example-. So, let's move into it:

cd spouse_example-*

Then, check if we have everything there:

ls -F
app.ddlog  db.url  deepdive.conf  input/  labeling/  mindbender/  udf/

1. Load input

You can find some of our sampled datasets under input/. You can also download the full corpus, but let's proceed with the one that has 1000 sampled articles:

ln -s articles-1000.tsv.bz2 input/articles.tsv.bz2
deepdive do articles

This will load the input data into the database. Note that everytime you use the deepdive do command, it opens a list of commands to be run in your text editor. You have to confirm it by saving and quiting the editor.

Here are a few lines from an example article in the input corpus that has been loaded.

deepdive query '?- articles("5beb863f-26b1-4c2f-ba64-0c3e93e72162", content).' format=csv | grep -v '^$' | tail -n +16 | head
8:30 a.m.
Raeann Meier and Mary Darnell are among the lucky ones to land tickets for Thursday's papal mass at the Basilica of the National Shrine of the Immaculate Conception.
Meier, who's from Round Hill, Virginia, won a pair of tickets in her church lottery and is bringing fellow parishioner Darnell.
Meier says of Francis: ""There is just no pope like this one."" She says ""Jesus hung out with the dregs — the tax collectors, the prostitutes"" and ""that's the way this pope is.""
7:50 a.m.
An elaborate welcoming ceremony full of American pomp and pageantry awaits Pope Francis when he goes to the White House.
The pope is scheduled to arrive by motorcade at about 9 a.m., his car pulling slowly up the South Lawn driveway to a red carpet, where President Barack Obama and his wife, Michelle, will be waiting to greet him.
In front of an estimated 15,000 people who were invited by the White House to witness the historic moment, Obama will then lead Francis to a dais decked out with even more red carpet and red, white and blue bunting, and ringed by military color guards. The Vatican and American national anthems will play. Obama will deliver a welcome address to the pope, followed by the pope's address.
Francis will also receive a thunderous 21-gun salute.

2. Process input

This app adds some useful NLP markups to the English text using Stanford CoreNLP. Based on the marked up named entity recognition (NER) tags, it can tell which parts of the text mention people's names. All pairs of names appearing in the same sentence are considered as candidates for correct mentions of married couples' names.

deepdive do sentences

After running the NLP markup process, we can see the tokens and NER tags for the example article we saw earlier.

deepdive query '?- sentences("5beb863f-26b1-4c2f-ba64-0c3e93e72162", _, _, tokens, _, _, ner_tags, _, _, _).' format=csv | grep PERSON | tail

We can continue running the processes until all candidates of spousal mentions are mapped, and see the pairs of names from the example article.

deepdive do spouse_candidate
deepdive query 'name1, name2 ?-
    spouse_candidate(p1, name1, p2, name2),
    person_mention(p1, _, "5beb863f-26b1-4c2f-ba64-0c3e93e72162", _, _, _).
    name1     |    name2
 Raeann Meier | Mary Darnell
 Meier        | Darnell
 Meier        | Francis
 Barack Obama | Francis
 Francis      | Obama
 Barack Obama | Michelle
 Obama        | Francis
 Barack Obama | Francis
(8 rows)

For supervised machine learning, the app continues with extracting features from the context of those candidates and creating a training set programmatically by finding promising positive and negative examples using distant supervision.

3. Run the model

Using the processed data, the app constructs a statistical inference model to predict whether a mention is a correct mention of spouses or not, estimates the parameters (i.e., learns the weights) of the model, and computes their marginal probabilities.

deepdive do probabilities

As a result, DeepDive gives the expectation (probability) of every variable being true. Here are the probabilities computed for the pairs of names from the example article we saw earlier:

deepdive sql "
    SELECT p1.mention_text, p2.mention_text, expectation
    FROM has_spouse_label_inference i, person_mention p1, person_mention p2
    WHERE p1_id LIKE '5beb863f-26b1-4c2f-ba64-0c3e93e72162%'
      AND p1_id = p1.mention_id AND p2_id = p2.mention_id
 mention_text | mention_text | expectation
 Raeann Meier | Mary Darnell |       0.129
 Meier        | Darnell      |           0
 Meier        | Darnell      |           0
 Meier        | Francis      |       0.009
 Barack Obama | Francis      |       0.002
 Francis      | Obama        |       0.011
 Barack Obama | Michelle     |       0.648
 Barack Obama | Michelle     |       0.598
 Obama        | Francis      |       0.014
 Barack Obama | Francis      |       0.017
(10 rows)

DeepDive provides a suite of tools and guidelines to work with the data produced by the application. For instance, below is a screenshot of an automatic interactive search interface DeepDive provides for browsing the processed data with predicted results.

Screenshot of the search interface provided by Mindbender

Next steps

Reading them will prepare you to write your own DeepDive application that can shed light on some dark data and unlock knowledge from it!