Unlike data acquisition, which can accumulate exponentially, we generally address data error correction on an exception basis, using manual, linearly-scaled resources. We cannot possibly scale manual data correction to keep up with our increased data volumes, which means we must automate our data quality processes with tools at least as robust as our data collection and storage resources. We cannot afford to scale up the human resources that today correct perhaps 100 customer name, address, part number or shipment date errors per week, to handle thousands or tens of thousands of such errors. Our only alternative is to automate and catch / fix those errors up front.
In a previous post (“Big Silos”) I posited three primary sources of big data: visually dense (e.g. video, satellite), temporally dense (e.g. audio, sensor), and transactions (e.g. POS, SKUs). I would like to amend that classification to now include a fourth primary source of big data that I had initially overlooked: unstructured data, including social media data.
80% of corporate data is unstructured, making it perhaps the single largest potential source of big data. As we start to process, structure and store that unstructured data, through techniques such as text analytics and content categorization, we need to remember to apply the same data quality standards as we do to more traditional transactional data.
Example: A prime candidate for the application of text analytics to derive value (e.g. enterprise risk management) would be customer and vendor contracts. After extracting and categorizing the usual suspects – party names, addresses, effective dates, etc – there is a wealth of valuable data lurking in the main body, addenda, amendments and attached statements of work, addressing issues such as:
- What’s my maximum limitation of liability? What’s my cumulative limitation of liability for similar products or territories? Same set of questions with regard to penalties for missing SLAs.
- Do I have all my customer warrants and representations backed off with the appropriate supplier? Same set of questions with regard to insurance requirements.
- For which marginal or unprofitable deals does the customer have the option to extend, and for how long, on which products / services?
- Can I visualize the various exclusivity clauses I have with dealers/resellers/vendors/customers with respect to territories and specific products, and their expiration dates?
To make a data quality initiative for unstructured data work, you need to understand how your business users are going to interact with the data, which in most cases will start with a “search” and then extend into analysis of specific associated terms / fields / categories. Data definitions will be just as rigorous as for transaction data - the definition of ‘Limitation of Liability’ and its associated elements, such as amount, time periods, exceptions, and exceptions to exceptions (e.g. “Notwithstanding the exceptions listed in section 4a, …”). Developing the business rules will likewise come from an in-depth understanding of the user requirements – warranties will be associated with both products and services, products will have part numbers, which via a BOM can be related back to vendor part numbers for the various sub-components.
Social media will be a similar minefield of data quality issues. Initially, social media is not directly a lead-generation tool, it starts out merely as relationship-generation, with the various names, nicknames, pseudonyms, handles, hashtags, shares, retweets and associated URLs and email addresses probably not a target for a data quality initiative.
At some point, however, these social relationships are going to need to move into the real world as actual leads. Sales is going to want a REAL name and a viable email address. Your CRM system will need to capture social interactions the way it does for direct marketing, with the social media profile as an invaluable piece of the total customer view. Two important factors in doing so will be the integration of CRM and social media platforms, and whether or not the social media platform is even capturing what CRM needs. Whatever data is being passed into CRM will require the usual data quality treatment.
The biggest gulf among data silos is not that between, say, structured customer CRM and structured product ERP data, but between our structured and unstructured data. Avoiding big silos of bad data will not be an easy challenge, but can be tackled with a focus on both data integration and data quality.
By Leo Sadovy, EPM Channel Contributor, from: http://blogs.sas.com/content/valuealley/2015/07/14/big-data-demands-big-quality/?utm_source=feedburner&utm_medium=email&utm_campaign=Feed%3A+ValueAlley+%28Value+Alley%29
Leo Sadovy handles marketing for Performance Management at SAS, which includes the areas of budgeting, planning and forecasting, activity-based management, strategy management, and workforce analytics, and advocates for SAS’ best-in-class analytics capability into the office of finance across all industry sectors. Before joining SAS, he spent seven years as Vice-President of Finance for Business Operations for a North American division of Fujitsu, managing a team focused on commercial operations, customer and alliance partnerships, strategic planning, process management, and continuous improvement. During his 13-year tenure at Fujitsu, Leo developed and implemented the ROI model and processes used in all internal investment decisions—and also held senior management positions in finance and marketing.Prior to Fujitsu, Sadovy was with Digital Equipment Corporation for eight years in sales and financial management. He started his management career in laser optics fabrication for Spectra-Physics and later moved into a finance position at the General Dynamics F-16 fighter plant in Fort Worth, Texas.He has an MBA in Finance and a Bachelor’s degree in Marketing. He and his wife Ellen live in North Carolina with their three college-age children, and among his unique life experiences he can count a run for U.S. Congress and two singing performances at Carnegie Hall. See Leo’s articles on EPM Channel here.