{"id":4633,"date":"2010-08-23T20:33:00","date_gmt":"2010-08-23T20:33:00","guid":{"rendered":"http:\/\/www.smartdatacollective.com\/index.php\/post\/real-data-value-business-insight\/"},"modified":"2010-08-23T20:33:00","modified_gmt":"2010-08-23T20:33:00","slug":"real-data-value-business-insight","status":"publish","type":"post","link":"https:\/\/www.smartdatacollective.com\/real-data-value-business-insight\/","title":{"rendered":"The Real Data Value is Business Insight"},"content":{"rendered":"<p><img loading=\"lazy\" loading=\"lazy\" decoding=\"async\" style=\"display: inline; margin-left: 0px; margin-right: 0px; border-width: 0px;\" title=\"Data Values for COUNTRY\" src=\"http:\/\/www.ocdqblog.com\/resource\/WindowsLiveWriter-BusinessInsightisDataValue_F7B3-?fileId=8235805\" border=\"0\" alt=\"Data Values for COUNTRY\" width=\"404\" height=\"351\" align=\"left\" \/> Understanding your <a title=\"The First Law of Data Quality by Jim Harris on the DataFlux Community of Experts\" href=\"http:\/\/www.dataflux.com\/dfblog\/?p=1458\" data-wpel-link=\"external\" rel=\"external noopener noreferrer ugc\">data usage<\/a> is essential to improving its quality, and therefore, you must perform data analysis on a regular basis.<\/p>\n<p><!--more--><\/p>\n<p><img loading=\"lazy\" loading=\"lazy\" decoding=\"async\" style=\"display: inline; margin-left: 0px; margin-right: 0px; border-width: 0px;\" title=\"Data Values for COUNTRY\" src=\"http:\/\/www.ocdqblog.com\/resource\/WindowsLiveWriter-BusinessInsightisDataValue_F7B3-?fileId=8235805\" border=\"0\" alt=\"Data Values for COUNTRY\" width=\"404\" height=\"351\" align=\"left\" \/> Understanding your <a title=\"The First Law of Data Quality by Jim Harris on the DataFlux Community of Experts\" href=\"http:\/\/www.dataflux.com\/dfblog\/?p=1458\" data-wpel-link=\"external\" rel=\"external noopener noreferrer ugc\">data usage<\/a> is essential to improving its quality, and therefore, you must perform data analysis on a regular basis.<\/p>\n<p>A <a title=\"OCDQ Blog Popular Content: Adventures in Data Profiling\" href=\"http:\/\/www.ocdqblog.com\/adventures-in-data-profiling\/\" data-wpel-link=\"external\" rel=\"external noopener noreferrer ugc\">data profiling<\/a> tool can help you by automating some of the grunt work needed to begin your data analysis, such as generating levels of statistical summaries supported by drill-down details, including data value frequency distributions (like the ones shown to the left).<\/p>\n<p>However, a common mistake is to hyper-focus on the data values.<\/p>\n<p>Narrowing your focus to the values of individual fields is a mistake when it causes you to lose sight of the wider context of the data, which can cause other errors like <a title=\"Data Quality and the Cupertino Effect\" href=\"http:\/\/www.ocdqblog.com\/home\/data-quality-and-the-cupertino-effect.html\" data-wpel-link=\"external\" rel=\"external noopener noreferrer ugc\">mistaking validity for accuracy<\/a>.<\/p>\n<p>Understanding data usage is about analyzing its most important context\u2014how your data is being used to make business decisions.<\/p>\n<p>&nbsp;<\/p>\n<h2>\u201cBegin with the decision in mind\u201d<\/h2>\n<p>In his excellent recent blog post <a title=\"It&#039;s time to industrialize analytics by James Taylor on the SmartData Collective\" href=\"http:\/\/smartdatacollective.com\/jamestaylor\/26598\/its-time-industrialize-analytics\" target=\"_blank\" data-wpel-link=\"internal\"><em>It\u2019s time to industrialize analytics<\/em><\/a>, <a title=\"James Taylor on Everything Decision Management\" href=\"http:\/\/jtonedm.com\/\" target=\"_blank\" data-wpel-link=\"external\" rel=\"external noopener noreferrer ugc\">James Taylor<\/a> wrote that \u201corganizations need to be much more focused on directing analysts towards business problems.\u201d&nbsp; Although Taylor was writing about how, in advanced analytics (e.g., data mining, predictive analytics), \u201cthere is a tendency to let analysts explore the data, see what can be discovered,\u201d I think this tendency is applicable to all data analysis, including less advanced analytics like data profiling and data quality assessments.<\/p>\n<p>Please don\u2019t misunderstand\u2014Taylor and I are <strong><em>not<\/em><\/strong> saying that there is no value in data exploration, because, without question, it can definitely lead to meaningful discoveries.&nbsp; And I continue to advocate that the goal of data profiling is not to find answers, but instead, to discover the right questions.<\/p>\n<p>However, as Taylor explained, it is because \u201cthe only results that matter are business results\u201d that data analysis should always \u201cbegin with the decision in mind.&nbsp; Find the decisions that are going to make a difference to business results\u2014to the metrics that drive the organization.&nbsp; Then ask the analysts to look into those decisions and see what they might be able to predict that would help make better decisions.\u201d<\/p>\n<p>Once again, although Taylor is discussing predictive analytics, this cogent advice should guide all of your data analysis.<\/p>\n<p>&nbsp;<\/p>\n<h2>The Real Data Value is Business Insight<\/h2>\n<p><img loading=\"lazy\" loading=\"lazy\" decoding=\"async\" style=\"display: inline; margin-left: 0px; margin-right: 0px; border-width: 0px;\" title=\"The Data Value is Business Insight\" src=\"http:\/\/www.ocdqblog.com\/resource\/WindowsLiveWriter-BusinessInsightisDataValue_F7B3-?fileId=8235806\" border=\"0\" alt=\"The Real Data Value is Business Insight\" width=\"404\" height=\"244\" align=\"left\" \/><\/p>\n<p>Returning to data quality assessments, which create and monitor metrics based on summary statistics provided by data profiling tools (like the ones shown in the mockup to the left), elevating what are low-level technical metrics up to the level of business relevance will often establish their <em>correlation<\/em> with business performance, but will not establish metrics that drive\u2014or should drive\u2014the organization.<\/p>\n<p>Although built from the bottom-up by using, for the most part, the data value frequency distributions, these metrics lose sight of the top-down fact that business insight is where the real data value lies.<\/p>\n<p>However, data quality metrics such as completeness, validity, accuracy, and uniqueness, which are just a few common examples, should definitely be created and monitored\u2014unfortunately, a single straightforward metric called <em>Business Insight<\/em> doesn\u2019t exist.<\/p>\n<p>But let\u2019s pretend that my other mockup metrics were real\u201450% of the data is inaccurate and there is an 11% duplicate rate.<\/p>\n<p>Oh, no!&nbsp; The organization must be teetering on the edge of oblivion, right?&nbsp; Well, 50% accuracy does <em>sound<\/em> really bad, basically like your data\u2019s accuracy is no better than flipping a coin.&nbsp; However, which data is inaccurate, and far more important, is the inaccurate data actually being used <em>to make a business decision?<\/em><\/p>\n<p>As for the duplicate rate, I am often surprised by the visceral reaction it can trigger, such as: \u201chow can we possibly claim to truly understand who our most valuable customers are if we have an 11% duplicate rate?\u201d<\/p>\n<p>So, would reducing your duplicate rate to only 1% <em>automatically<\/em> result in better customer insight?&nbsp; Or would it simply mean that the data matching criteria was too conservative (e.g., requiring an exact match on all \u201ccritical\u201d data fields), preventing you from discovering how many duplicate customers you have?&nbsp; (Or maybe the 11% indicates the matching criteria was too aggressive).<\/p>\n<p>My point is that accuracy and duplicate rates are <em>just numbers<\/em>\u2014what determines if they are a good number or a bad number?<\/p>\n<p>The fundamental question that every data quality metric you create must answer is: <em>How does this provide business insight?<\/em><\/p>\n<p>If a data quality (or any other data) metric can not answer this question, then it is meaningless.&nbsp; Meaningful metrics always represent business insight because they were created by beginning with the business decisions in mind.&nbsp; Otherwise, your metrics could provide the comforting, but false, impression that all is well, or you could raise red flags that are really <a title=\"Red Flag or Red Herring?\" href=\"http:\/\/www.ocdqblog.com\/home\/red-flag-or-red-herring.html\" data-wpel-link=\"external\" rel=\"external noopener noreferrer ugc\">red herrings<\/a><em>&nbsp;<\/em>.<\/p>\n<p>Instead of beginning data analysis with the business decisions in mind, many organizations begin with only the data in mind, which results in creating and monitoring data quality metrics that provide little, if any, business insight and decision support.<\/p>\n<p>Although analyzing your data values is important, you must always remember that <em>the real data value<\/em> is <em>business insight<\/em>.<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Understanding your data usage is essential to improving its quality, and therefore, you must perform data analysis on a regular basis.<\/p>\n","protected":false},"author":56,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_seopress_robots_primary_cat":"","_seopress_titles_title":"","_seopress_titles_desc":"","_seopress_robots_index":"","footnotes":""},"categories":[4],"tags":[],"class_list":{"0":"post-4633","1":"post","2":"type-post","3":"status-publish","4":"format-standard","6":"category-data-mining"},"amp_enabled":true,"_links":{"self":[{"href":"https:\/\/www.smartdatacollective.com\/wp-json\/wp\/v2\/posts\/4633","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.smartdatacollective.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.smartdatacollective.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.smartdatacollective.com\/wp-json\/wp\/v2\/users\/56"}],"replies":[{"embeddable":true,"href":"https:\/\/www.smartdatacollective.com\/wp-json\/wp\/v2\/comments?post=4633"}],"version-history":[{"count":0,"href":"https:\/\/www.smartdatacollective.com\/wp-json\/wp\/v2\/posts\/4633\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.smartdatacollective.com\/wp-json\/wp\/v2\/media?parent=4633"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.smartdatacollective.com\/wp-json\/wp\/v2\/categories?post=4633"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.smartdatacollective.com\/wp-json\/wp\/v2\/tags?post=4633"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}