Data Quality Issue | Description | ||||||||||||||||||||
Completeness | Is all the required information available? Are data values missing, or in an unusable state? In some cases, missing data is irrelevant, but when the information that is missing is critical to a specific business process, completeness becomes an issue. Example: if you have an email field where only 50,000 values are present out of a total of 75,000 records, then the email field is 66.6% complete. | ||||||||||||||||||||
Conformity | Are there expectations that data values conform to specified formats? If so, do all the values conform to these formats? Maintaining conformance to specific formats is important in data representation, presentation, aggregate reporting, search, and establishing key relationships. Example: The Gender codes in two different systems are represented differently; in one system the codes are defined as ‘M’, ‘F’ and ‘U’ whereas in the second system they appear as 0, 1, and 2. | ||||||||||||||||||||
Consistency | Do values represent the same meaning? Example: Is revenue always presented in Dollars or also in Euro? | ||||||||||||||||||||
Accuracy | Do data objects accurately represent the “real-world” values they are expected to model? Incorrect spellings of product or person names, addresses, and even untimely or not current data can impact operational and analytical applications. Example: A customer’s address is a valid USPS address. However, the ZIP code is incorrect and the customer name contains a spelling mistake. | ||||||||||||||||||||
Validity | Do data values fall within acceptable ranges? Example: Salary values should be between 60,000 and 120,000 for position levels 51 and 52. | ||||||||||||||||||||
Duplication | Are there multiple, unnecessary representations of the same data objects within your data set? The inability to maintain a single representation for each entity across your systems poses numerous vulnerabilities and risks. Duplicates are measured as a percentage of the overall number of records. There can be duplicate individuals, companies, addresses, product lines, invoices and so on. The following example depicts duplicate records existing in a data set:
|
DQS的认识一
最新推荐文章于 2024-07-01 18:29:41 发布