Data duplication happens all the time. It is an inevitable phenomenon as millions of data are gathered at very short intervals. A data warehouse is basically a database and having unintentional duplication of records created from the millions of data from other sources can hardly be avoided. In the data warehousing community, the task of finding duplicated records within large databases has long been a persistent problem and has become an area of active research. There have been many research undertakings to address the problems of data duplication caused by duplicate contamination of data. Several approaches have been implemented to counter the problem of data duplication. One approach is manually coding rules so that data can be filtered to avoid duplication. Other approaches include having applications of the latest machine learning techniques or more advance business intelligence applications. The accuracy of the different methods for countering data duplication varies. For very large data collection implementing some of the methods may be too complex and also expensive to be deployed in their full capacity.
Copyright © 2026 eLLeNow.com All Rights Reserved.