Isaac Sacolick
Isaac Sacolick
President and CIO

Isaac Sacolick is the President and CIO of StarCIO, providing executive management services for digital transformation and leading-edge IT practices. Previously, he led the delivery of Greenwich Associates' business transformation initiative, including implementing new business intelligence platforms, upgrading the company's CRM processes, and spearheading a digital transformation of marketing practices. Prior to Greenwich Associates, Sacolick was the CIO of McGraw Hill Construction (now Dodge, Data and Analytics), a leading provider of data, analytics, news and intelligence serving the North American construction industry and the CIO of BusinessWeek Magazine. He was also founder and COO at TripConnect, a travel industry social network and CTO at PowerOne Media, a SaaS provider to the newspaper industry. Sacolick has been recognized as a top 100 social CIO, blogger, and industry speaker and he blogs at Social, Agile, and Transformation  and

This article is by Featured Blogger Isaac Sacolick from his blog Social, Agile, and Transformation.

It's been almost two years since I last blogged about the threats and opportunities around Dark Data, so when I was asked to join Joseph Bradley in Cisco's Future of IT podcast on Navigating Dark Data To Find Hidden Value in a Digital Era I couldn't resist.

To remind everyone, I provided the following definition of Dark Data in my post

Dark data is data and content that exists and is stored, but is not leveraged and analyzed for intelligence or used in forward looking decisions - Isaac Sacolick, see full definition

Since then, the subject of dark data has been covered by (The Dangers of Dark Data), Forbes (Factories of the Future), and VentureBeat (Are you afraid of the dark data?). VentureBeat points to the cause of the problem:

As storage has become cheaper, those who generate data have grown used to hanging onto it... When data is “dark,” it’s often because the organizations that own it lack the tools, infrastructure, or skills to effectively leverage it - John Joseph

Internal Dark Data, that is, dark data that is already captured and stored by the enterprise but not leveraged to drive insights or decision making represents a threat and a possible missed opportunity. The threat is if storing the data impacts the performance of a key business operation or contains sensitive information that should be better secured. The opportunity is to really figure out the value in retaining and processing this data.

Determining the Business Value of Stored, Dark Data

  • One of the themes we discussed during the podcast is developing the discipline on identifying the business value in dark data. To do this, use basic data visualization, analytics, and quality tools to identify the substance of the data and look to answer some basic questions:
  • How to catalog the data so that business users can learn about its existence? Can the data be broken down into basic entities, dimensions, metrics and volumes to provide more details to business users looking for data sources?
  • Identify 3-5 potential questions, insights, decisions, or activities that can be researched using this data should someone commission a data scientist to investigate.
  • Also identify "known issues" with the data source. This can be measures of data quality, information on how the data is sourced, and other feedback that might undermine any analysis of the data.
  • Have a Data Governance board "score" this data set based on its potential vs. known issues. Absent of  any easy way to quantify value, scoring by a voting committee can at least rank what data sets look attractive for further analysis.
  • Commission time-boxed studies (aka, agile sprints!) on data sets that have the highest scores. Review results and re-rank based on findings. (Note: See my post, Best Data Visualization Practices in Self Service BI Programs for some ideas on how to implement.)
  • Make sure that you have disciplined agile data scientists. Have them demo their findings to the Data Governance board and adjust the Score based on what was discovered.

As I said in the podcast, I suggest data scientists look at the value of data before investing too much time in discovery. Imagine that all the friction in the analysis because of the data set's size, speed, complexity, variety, or quality can be "solved" given sufficient skills, tools, and time - what do you hope to get out of this data analytics or mining exercise?

For additional insights, see my post 10 Attributes of Data Driven Organizations.

Originally posted on Social, Agile, and Transformation. Other featured posts by Isaac Sacolick: Ten Ways to Improve IT Culture: Agile, DevOps, Data, and CollaborationCIO’s Five Predictions for 2015