Papers
DeepDive System Overview
- Best Reference: DeepDive: A Data Management System for Automatic Knowledge Base Construction. Ce Zhang.Ph.D. Dissertation, University of Wisconsin-Madison, 2015.
Applications
Best Reference: Machine-compiled Macroevolutionary History of Phanerozoic Life. Shanan E. Peters, Ce Zhang, Miron Livny, and C. Ré. PloS ONE.
The Mobilize Center: an NIH big data to knowledge center to advance human movement research and improve mobility. Joy P Ku, Jennifer L Hicks, Trevor Hastie, Jure Leskovec, C. Ré, and Scott L Delp. AMIA, 2015.
Large-scale extraction of gene interactions from full text literature using DeepDive. Emily Mallory, Ce Zhang, C. Ré, and Russ Altman. Bioinformatics 2015.
A Demonstration of Data Labeling in Knowledge Base Construction. Jaeho Shin, Mike Cafarella, and C. Ré. VLDB Demo, 2015.
Elementary: Large-scale Knowledge-base Construction via Machine Learning and Statistical Inference. F. Niu, Ce Zhang, C. Ré, and J. Shavlik. IJSWIS, Special Issue on Knowledge Extraction from the Web, 2012.
Stanford’s 2014 Slot Filling Systems. Gabor Angeli, Sonal Gupta, Melvin Jose, Christopher D. Manning, Christopher Ré, Julie Tibshirani, Jean Y. Wu, Sen Wu, Ce Zhang.
Using Social Media to Measure Labor Market Flow. D. Antenucci, M. Cafarella, M. Levenstein, C. Ré, and M. Shapiro. NBER. Selected for NBER Digest.
Feature Selection in Enterprise Analytics: A Demonstration using an R-based Data Analytics System P. Konda, A. Kumar, C. Ré, and V. Sashikanth. VLDB Demo 2013
GeoDeepDive: Statistical Inference using Familiar Data-Processing Languages. Ce Zhang, V. Govindaraju, J. Borchardt, T. Foltz, C. Ré, and S. Peters. SIGMOD 13 (demo).
Hazy: Making it Easier to Build and Maintain Big-data Analytics. Arun Kumar, Feng Niu, and C. Ré ACM Queue, 2013. Invited to CACM March 2013
Building an Entity-Centric Stream Filtering Test Collection for TREC 2102. J.R. Frank, M.Kleiman-Weiner, D. A. Roberts, F.Niu, Ce Zhang, C. Ré, and I. Soboroff. TREC 2013
Feature Engineering
- Best Reference: Feature Engineering for Knowledge Base Construction. DeepDive Group. Data Engineering Bulletin.
- Brainwash: A Data System for Feature Engineering. M. Anderson et al. CIDR Conference 2013 (Vision Track)
- Materialization Optimizations for Feature Selection. Ce Zhang, Arun Kumar, and C. Ré. SIGMOD 2014. Best Paper Award.
- Parallel Feature Selection Inspired by Group Testing. Y. Zhou et al. NIPS2014.
The Underlying Engine
- Best Overall Description: Incremental Knowledge Base Construction Using DeepDive Sen Wu, Ce Zhang, Christopher De Sa, Jaeho Shin, Feiran Wang, and C. Ré. VLDB. 2015.
- Best Reference for Sampler: DimmWitted: A Study of Main-Memory Statistical Analytics. Ce Zhang and C. Ré. VLDB 2014.
Best Reference for Grounding: Tuffy: Scaling up Statistical Inference in Markov Logic Networks using an RDBMS. F. Niu, C. Ré, A.Doan, and J.W. Shavlik. PVLDB 11.
Rapidly Mixing Gibbs Sampling for a Class of Factor Graphs Using Hierarchy Width. Christopher De Sa, Ce Zhang, Kunle Olukotun, and C. Ré. NIPS 2015 (Spotlight).
Taming the Wild: A Unified Analysis of Hogwild!-Style Algorithms. Christopher De Sa, Ce Zhang, Kunle Olukotun, and C. Ré. NIPS 2015.
Towards High-Throughput Gibbs Sampling at Scale: A Study across Storage Managers. Ce Zhang and C. Ré. SIGMOD 2013
An Asynchronous Parallel Stochastic Coordinate Descent Algorithm. J. Liu, S. Wright, C. Ré, V. Bittorf, S. Sridhar. ICML 2014.
Hogwild!: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent. F. Niu, B. Recht, C. Ré, and S. J. Wright. NIPS, 2011
Felix: Scaling Inference for Markov Logic with an Operator-based Approach. Feng Niu, Ce Zhang, C. Ré, and Jude Shavlik.
Some Related Techniques
- Understanding Tables in Context Using Standard NLP Toolkits. Vidhya Govindaraju, Ce Zhang, and C. Ré. ACL 2013
- Probabilistic Management of OCR using an RDBMS. Arun Kumar and C. Ré. PVLDB 2012.
- Optimizing Statistical Information Extraction Programs Over Evolving Text. Fei Chen, Xixuan Feng, C. Ré, and Min Wang. ICDE
- Incrementally maintaining classification using an RDBMS. Mehmet Levent Koc and C. Ré. PVLDB Volume 4, 2011, p. 302-313
Studies
- Big Data versus the Crowd: Looking for Relationships in All the Right Places. Ce Zhang, Feng Niu, C. Ré, and Jude Shavlik. ACL, 2012.
Formal Foundations
- Probabilistic Databases. Dan Suciu, Dan Olteanu, C. Ré, and Christoph Koch. Morgan Claypool's Synthesis Lectures, 2011
- Transducing Markov Sequences. Benny Kimelfeld and C. Ré. JACM 2014.
- Queries and materialized views on probabilistic databases. Nilesh N. Dalvi, C. Re, and Dan Suciu. JCSS 2011.