Discovery of Genuine Functional Dependencies from Relational Data with Missing Values [Abstract for INFORSID 2019]
Abstract
This article is an extended abstract of our work published at VLDB’2018. The full paper is available at www.vldb.org/pvldb/vol11/p880-berti-equille.pdf .
Functional dependencies (FDs) play an important role in maintaining data quality in relational databases. They can be used to enforce data consistency and guide data repairs. In this work, we investigate the problem of missing values and its impact on FD discovery. When using exist- ing FD discovery algorithms, some genuine FDs could not be detected precisely due to missing values and some non-genuine FDs can be discovered even though they are caused by missing values depending on the considered semantics for NULL values. We define the notion of gen- uineness of FDs and propose algorithms to compute the FD genuineness score. This can be used to identify genuine FDs among the set of all valid dependencies that hold on the data. We evaluate the quality of our method over various real-world and semi-synthetic datasets with extensive experiments. The results show that our method performs well for relatively large FD sets and is able to accurately capture genuine FDs.
Origin : Files produced by the author(s)