Debugging user-defined functions
Many things can go wrong in user-defined functions (UDFs), so debugging support is important for the user to write the code and easily verify that it works as expected. UDFs can be implemented in any programming language as long as they take the form of an executable that reads from standard input and writes to standard output. Here are some general tips for printing information to the log and running the UDFs in limited ways to help work through issues without needing to run the entire data flow of the DeepDive application.
Printing to the log
Remember that the standard output of a UDF is already reserved for TSV formatted data that gets loaded into the database. Therefore when a typical print statement is used for debugging, it won't appear anywhere in the log but just mangle the TSV output stream and ultimately fail the UDF execution or corrupt its output. The correct way to print log statements is to print to the standard error. Below is an example in Python.
#!/usr/bin/env python from deepdive import * import sys @tsv_extractor @returns( ... ) def extract( ... ): ... print >>sys.stderr, 'This prints some_object to logs :', some_object ...
During execution of the script, anything written to standard error appears in the console as well as in the file named
run.log under the
Executing UDFs within DeepDive's environment
To assist with debugging issues in UDFs, DeepDive provides a wrapper command to directly execute it within the same environment it uses for the actual execution.
Traceback (most recent call last): File "udf/fn.py", line 2, in <module> from deepdive import * ImportError: No module named deepdive
Instead, by prefixing the command with
deepdive env, they can be executed as if they were executed in the middle of DeepDive's data flow.
deepdive env python udf/fn.py
This will take TSV rows from standard input and print TSV rows to standard output as well as debug logs to standard error. It can therefore be debugged just like a normal Python program.