Dirty data is a term used to describe any type of electronic data that is outdated, incomplete, or otherwise not accurate. Data of this type may be created due to errors in data entry, a failure to update the data on a regular basis, or even the entry of the same data more than once. At times, the incorrect data is nothing more than errors in punctuation in the text of electronic documents. In other instances, dirty data may be information that is intentionally misleading, such as attempts to modify accounting records to present a specific image to investors and others.
For the most part, the accumulation of dirty data in any type of database is unintentional. Individuals who are entering new information into the database may misspell words, leave out punctuation that is important to understanding the intent of text, or fail to follow a specific formatting strategy. With situations of this type, correcting the incorrect information is a relatively simple process that requires nothing more than altering the incorrect text and saving the changes. Businesses sometimes manage this process by proofreading data after it is entered and making the necessary updates.
Dirty data may also occur due to a failure to update existing records when information changes. For example, if salespeople fail to update customer files when personnel changes occur with a given customer, those files are no longer accurate and are considered dirty. As with correcting spelling and punctuation errors, taking the time to remove outdated information and replace it with current data helps to increase the overall usability of the database.
There are situations where the creation of dirty data is intentional. Companies may choose to omit specific information from a database in order to create a specific perception regarding finances, such as highlighting the amount of generated revenue for a given period, but choosing to not enter data that relates to the amount of collected revenue for the same period. In this type of dirty data, the information that is presented is accurate as far as it goes, but is considered incomplete.
With some types of dirty data, the decision may be to not take the time and effort to make corrections. This is common when the incorrect data does not have any impact on the ability of the business to function properly, or presents no potential for causing any great distress. This means that just about any entity that maintains some type of database probably has at least a little dirty data interspersed with other information that is current and accurate.