Title:
Quality of Operational Data: A Challenge for Datawarehouse Design
Abstract: Information systems allow companies and organizations
to collect a large number of operational and transactional data. Data warehousing
provides tools and techniques for analysing these data and deriving information
at a level of abstraction suitable to support decision processes. However
fundamental choices have to be taken in the process of configuring a data
warehouse and importing data: the user has to explicitly tell the system
(i) which data have to be analysed, (ii) which attributes have to be considered
as measures and which as dimensions in the data warehouse, (iii) how high
each dimension should be generalized and finally (iv) what is the quality
and reliability of the data warehouse and the related decision process. Since
these choices are not a trivial tasks for users (especially for complex databases)
and considering that they can heavily influence the data warehouse effectiveness,
in this paper we propose a methodology based on a set of statistical indexes
that allows one to derive information on data quality. We are currently experimenting
our methodology on a set of real-world operational databases and studying
the preliminary results.
Author: Maurizio Pighin and Lucio Ieronutti