We are independent & ad-supported. We may earn a commission for purchases made through our links.
Advertiser Disclosure
Our website is an independent, advertising-supported platform. We provide our content free of charge to our readers, and to keep it that way, we rely on revenue generated through advertisements and affiliate partnerships. This means that when you click on certain links on our site and make a purchase, we may earn a commission. Learn more.
How We Make Money
We sustain our operations through affiliate commissions and advertising. If you click on an affiliate link and make a purchase, we may receive a commission from the merchant at no additional cost to you. We also display advertisements on our website, which help generate revenue to support our work and keep our content free for readers. Our editorial team operates independently of our advertising and affiliate partnerships to ensure that our content remains unbiased and focused on providing you with the best information and recommendations based on thorough research and honest evaluations. To remain transparent, we’ve provided a list of our current affiliate partners here.
Technology

Our Promise to you

Founded in 2002, our company has been a trusted resource for readers seeking informative and engaging content. Our dedication to quality remains unwavering—and will never change. We follow a strict editorial policy, ensuring that our content is authored by highly qualified professionals and edited by subject matter experts. This guarantees that everything we publish is objective, accurate, and trustworthy.

Over the years, we've refined our approach to cover a wide range of topics, providing readers with reliable and practical advice to enhance their knowledge and skills. That's why millions of readers turn to us each year. Join us in celebrating the joy of learning, guided by standards you can trust.

How do I Maintain Data Warehouse Quality?

By Carol Francois
Updated: May 17, 2024
Views: 9,026
Share

There are four primary factors to consider when you are looking to maintain data warehouse quality: data integrity, data input source and methodology used, frequency of data import and audience. A data warehouse is an electronic repository of large quantities of data and is used increasingly by businesses and other larger organizations to store data in a tool that facilitates reporting and data output requirements. The usefulness of a data warehouse is driven primarily by the quality of the data and the responsiveness to user requirements.

Data integrity is a concept common to data warehouse quality as it relates to the rules governing the relationships between the data, dates, definitions and business rules that shape the relevance of the data to the organization. Keeping the data consistent and reconcilable is the foundation of data integrity. Steps used to maintain data warehouse quality must include a cohesive data architecture plan, regular inspection of the data and the use of rules and processes to keep the data consistent whenever possible.

The data input source for a data warehouse is typically an import tool or program. The easiest way to maintain data warehouse quality is to implement rules and checkpoints in the data import program itself. Data that does not follow the appropriate pattern will not be added to the data warehouse but will require user intervention to correct, reconcile or change the program. In many organizations, these types of changes can be implemented only by the data warehouse architect, which greatly increases the data warehouse quality.

The accuracy and relevance of the data is essential to maintaining data warehouse quality. The timing of the import and frequency has a large impact on the overall usefulness of the tool, as well as the quality. For example, if purchase order information is entered into the warehouse but invoices are updated only intermittently, the ability to report accurately on purchase-related activity is compromised.

Data warehouse quality is easiest to maintain and support if the users are knowledgeable and have a solid understanding of the business processes. Training the users to not only understand how to build queries, but on the underlying data warehouse structure enables them to identify inconsistencies much faster and to highlight potential issues early in the process. Any changes to the data tables, structure or linkages and the addition of new data fields must be reviewed with the entire team of users and support staff members in order to ensure a consistent understanding of the risks and challenges that might occur.

Share
WiseGeek is dedicated to providing accurate and trustworthy information. We carefully select reputable sources and employ a rigorous fact-checking process to maintain the highest standards. To learn more about our commitment to accuracy, read our editorial process.

Editors' Picks

Discussion Comments
By SkyWhisperer — On Dec 25, 2011

@everetra - I would prefer more power in data warehouse reporting. I’ve been able to pull reports, but it seems that the data architects create these subsets of data for us to work on.

That’s okay I suppose, but sometimes I want to pull in larger data sets. I have to jump through hoops to get the architect to load in more data into the warehouse for us to work on.

I understand his position. He wants to make sure we only get what we need to do our job, and not bog down the server. Still, I find it a bit too restrictive for my needs.

By everetra — On Dec 24, 2011

@NathanG - I think it’s important not to give end users too much power. That way they can’t break the data so to speak. For example in our company end users pretty much have read only access. That means they can build basic “SELECT” queries but they can’t run queries that will modify the tables in anyway.

If they enter data, the data gets checked for proper formatting and stuff like that at the entry stage. If it doesn’t follow proper formatting it gets flagged and doesn’t get entered.

We use business intelligence software to input the information. With business intelligence data warehouse checks are put in place so that only good data enters the system. Of course these checks are only as good as the programmers who define them, so as long as they’ve done their jobs properly.

By NathanG — On Dec 24, 2011

@David09 - Well, one way to avoid the data integrity issue is to closely monitor how the data gets there in the first place. As long as everyone is following the same data warehouse methodology then the data entered should follow the same format.

As an architect I think you could be the gatekeeper in this respect. You could create a data entry form for all users to use to enter the data.

If data is not coming in through a data entry form but instead is being bulk loaded (like through another server or something) then I suppose that you would have to stipulate what the data warehouse requirements are for the data that you’re putting in.

By David09 — On Dec 23, 2011

Of all the components listed for data warehouse ETL, I would have to say that data integrity is the most important. The most common problem that I’ve seen in businesses I’ve worked at is a lack of consistency.

For example, a simple customer record may be listed with different names but they are all the same customer. Of course, if you have a customer ID then that should help you tie the records together. But these records themselves may have information that is inconsistent.

So while it seems at first glance that it would be an automated process, at some point you will have to ask around to prune up the data correctly. It tends to be very time consuming in the end I think.

Share
https://www.wisegeek.net/how-do-i-maintain-data-warehouse-quality.htm
Copy this link
WiseGeek, in your inbox

Our latest articles, guides, and more, delivered daily.

WiseGeek, in your inbox

Our latest articles, guides, and more, delivered daily.