Compiling a DeepDive application
The first step to run a DeepDive application is to compile it.
We describe here the use of the different deepdive
commands, why and where it compiles and which files can be interesting to look at.
How to compile
To compile a DeepDive application, simply run the following command within the DeepDive app.
deepdive compile
All subsequent deepdive do
commands will run what has been compiled, so an app has to be compiled at least once initially before executing any part of it.
What is compiled
All compiled output is created under the run/
directory of the app:
run/dataflow.svg
A data flow diagram showing the dependencies between processes. This file contains a graph of the dataflow and helps better understand the dependencies among relations, rules and user-defined functions used to produce them. It can be opened in a web browser (Chrome or Safari work particularly well for it).
run/Makefile
A Makefile that mirrors the dependencies, used for generating execution plans for running certain parts of the data flow.
run/process/**/run.sh
Shell scripts that contain what actually should be run for every process. These scripts will be run by certain
deepdive do
commands.run/LATEST.COMPILE/
andrun/compiled/
Each compilation step creates a unique timestamped directory under
run/
to store all the intermediate representation used for compilation (*.json
). In particular, the latest compiled version can be found inrun/LATEST.COMPILE/
. For instance, thedeepdive.conf
used for the latest compilation can be found atrun/LATEST.COMPILE/deepdive.conf
.config.json
The compiled final configuration object.
code-*.json
The compiled object for generating actual code.
compile.log
The log file left by the latest compilation step.
If you keep track of your DeepDive application with Git, it may be important to add a /run
line in .gitignore
.
When to compile
The following files in a DeepDive application are considered as the input to deepdive compile
:
app.ddlog
deepdive.conf
schema.json
deepdive compile
should be rerun whenever changes to these files are to be reflected on the execution done by deepdive do
.
Otherwise, the modifications in app.ddlog
won't be considered, and simply the last compiled code will execute.
Why compile
Compiling a DeepDive application presents two main advantages. First, it helps to run the application faster by extracting the correct dependencies between the different operations. Second, it provides many checks for the application and helps detecting errors even before running DeepDive.
Compile-time checks
All the extractors and tables declarations can be written in any order in app.ddlog
: the whole dataflow is extracted during the compilation and many checks are made.
All these checks can also be performed individually by the command deepdive check
.
In particular, the following checks are made:
input_extractors_well_defined
: checks sanity of all defined extractors.input_schema_wellformed
: checks if schema variables are well formed.compiled_base_relations_have_input_data
: all the tables declared that are not filled by an extractor must have input data ininput/
that can be loaded automatically.compiled_dependencies_correct
: check that the dependencies are correct. In particular, in the application includes extractors in deepdive.conf, each output relation must be output by exactly one extractor.compiled_input_output_well_defined
: checks if all inputs and outputs of compiled processes are well-defined.compiled_output_uniquely_defined
: checks if all outputs in the compiled documents are defined by one process.