KPMG - Data Quality Assessment
- Varun 2001
- Jan 20, 2022
- 3 min read
Task 1
Data Quality Assessment
Assessment of data quality and completeness in preparation for analysis


Here is the background information on your task
Sprocket Central Pty Ltd, a medium-size bikes & cycling accessories organization, has approached Tony Smith (Partner) in KPMG’s Lighthouse & Innovation Team. Sprocket Central Pty Ltd is keen to learn more about KPMG’s expertise in its Analytics, Information & Modelling team.
Smith discusses KPMG’s expertise in this space (you can read more here). In particular, he speaks about how the team can effectively analyze the datasets to help Sprocket Central Pty Ltd grow its business.
Primarily, Sprocket Central Pty Ltd needs help with its customer and transactions data. The organization has a large dataset relating to its customers, but its team is unsure how to effectively analyze it to help optimize its marketing strategy.
However, in order to support the analysis, you speak to the Associate Director for some ideas and she advised that “the importance of optimizing the quality of customer datasets cannot be underestimated. The better the quality of the dataset, the better chance you will be able to use it to drive company growth.” The client provided KPMG with 3 datasets:
Customer Demographic
Customer Addresses
Transactions data in the past 3 months
You decide to start the preliminary data exploration and identify ways to improve the quality of Sprocket Central Pty Ltd’s data.

TASK
Draft an email to the client identifying the data quality issues and strategies to mitigate these issues. Refer to ‘Data Quality Framework Table’ and resources below for criteria and dimensions which you should consider.
SOLUTION
Dear [Client point-of-contact], Thank you for providing us with the three datasets from Sprocket Central Pty Ltd. The below table highlights the summary statistics from the three datasets received.
Notable data quality issues that were encountered and the methods used to mitigate the identified data inconsistencies are as follows. Furthermore, recommendations have been provided to avoid the recurrence of data quality issues and improve the accuracy of the underlying data used to drive business decisions.
Ensure Data quality in Customer Demographic DataSet
In the Gender Column, there are misspelled words
Recommendation: Replace misspelled and acronym F, M with Female, Male respectively
● In DOB Column , There is an outlier that the person with a age of 175
Recommendation : Remove the item from the dataset
● In Job title Column , Identified more Blanks
Recommendation: Remove all blanks
● In the Deceased Indicator Column , there are two entries Y , N
Recommendation: remove Y entries in column to get accurate results further
● Default Column , Entries that are improper and not valid
Recommendation: Delete the entire column
Ensure Data quality in Customer Address DataSet
● In-State Column , Entries of states are in Acronym and Fullname
Recommendation: Replace all State names with respective Acronym
Ensure Data quality in Transactions DataSet
● Remove blanks in ‘Online order’, ‘Brand’ columns
● In the list_price Column , the currencies are in number format
Recommendation: Replace the numbers into Currency Format
● In the Product_first_sold_date Column , there are numbers that specify nothing
Recommendation: Change the numbers to Short Date
Ensure Data quality in NewCustomerList DataSet
● In ‘past_3_years_bike_related_purchase’ and ‘Postcode’ Column , Wrong DataType - Numbers are stored as text entries
Recommendation: Select the entire column and convert it as a number
● Remove Blanks in ‘DOB’ , ‘Job Title’ Columns
● In the ‘Property Valuation’ Column , Numbers are stored as text entries and are in decimals once Changed
Recommendation: Select the entire column and convert it as a number then decrease decimal
● Remove Unwanted Columns that are generated by Random Function
● Name a column ‘Rank’ that specifies the RANK()
Click Here to Download Solution For Task 1
Comments