What is Your Data Quality Score


by Michael McDonald and Kim Chandler McDonald, Co-Founders of FlatWorld™ Integration

Quality is more important that quantity.” - Steve Jobs

If Samuel Taylor Coleridge were alive today perhaps, rather than bemoaning, “Water, water everywhere, Nor any drop to drink.”, he’d be delving into the data drama. We’re drowning in data, yet so little of it is ‘drinkable’ - too much of it is of little to no use, providing little to no actionable acumen. Knowing the quality of your data will go a long way to ensuring you don’t sink under the deluge. Keep reading to find out why its important, and how easy it is to do.

The fact of the matter is, regardless of the quality of your insight-generating algorithms, the old IT proverb, ‘garbage in, garbage out’ holds true.  If you don’t know the quality of your data then you not only run the risk of offering poor service but, worse than that, of offering the wrong service to your clients, customers and colleagues.

Data is replicated all over your company. The same person (be they client, customer or colleague) can have the same type of data (the product data, name and address data) stored in scores of systems over a multitude of spreadsheets and it will, undoubtably, go into a wide number of reports. 

Unfortunately for most companies, this data can be inconsistent, incomplete, out-of-date, or worse, plain wrong.  Moreover, this same data is likely being rolled into processes that are feeding your decision making or, increasingly, into your machine-learning algorithms. Which leads us back to the basic problem: ‘garbage in, garbage out’. Simply put, you need to measurably separate your data into data that works for you and data that works against you. 

Measurement of your data is surprisingly easy to implement. It can be as simple as adding in a confidence or quality level ranking to your data from 1 to 5 (1 being unknown and/or unverified and 5 being completely accurate and/or verified).

This simple assignment of quality can be easily added into your machine-learning/analytics/reports. In doing so, you can confidently ask for “the data we really know about our customers,” and, from there, determine real, actionable and, most importantly, relevant outcomes that can drive happier more valuable customers.

The alternative is being immersed in ‘good decisions based on bad data’ - the worst outcome possible for any investment in data-analytics/machine-learning.

In five easy steps you can create a quick benchmarking/scoring of your company’s data, which will inform everyone from operational to strategic management how much they can trust the data they are building their decisions from.

The steps are:

  1. Identify your core information. For most companies this would encompass their customer profiles and associated product portfolios. Most companies will view customer service as being essential, so product use and any customer service related data can be defined as your core data elements.

  2. Identify your SoR (Source of Record). In most companies you will have more than one system storing your core information. You need to select one system for each core data element – its recommended that you choose the system that has to the most up-to-date and accurate information.

  3. Scoring. You then need to score each data element (e.g. customer address) in your core data on a simple scale between one and five. The range should reflect one equating to, “we don’t know” and 5 equalling, “we know that the information is correct because we had the customers validate the address, we have cross checked it with 3rd party data and the addresses are “real” i.e. they work in Google Maps”. A score of 3 would be the norm for most SoR systems; this is because generally customers update this data, therefore it is only as accurate and as up-to-date as they have entered it and ‘agreed’ to share with you.

  4. Measuring. You now have a simple measurement/score reflecting how much you can trust your SoR data. Now all you need to do is make sure you use this scoring data to generate your reports so you will generate consistent reporting, from different departments, with a known quality of data.

  5. Validation. Machine-learning/deep learning (also known as AI) can require large amounts of data to ‘work their magic’ but they can generate poor results with poor data. Big data in no way guarantees Smart Data so systemic data quality/validation processes need to be put in place for SoR that score a 3 or less. This means creating a method of checking your data for accuracy, timeliness (is is up-to-date) and completeness (are we missing anything). The result will be that your data will a) be of better quality and b) won’t go out-of-date as it would be the case if it was a ‘one-off’ fix.

Data is cheap, information is invaluable.  With your uplifted SoR data driving your analytics, reporting and any machine-learning/AI your company will be set to begin the journey to offer the sought after omni-channel, hyper-personalised service you need to differentiate in today’s global and hyper-competitive market landscape.