What is Dark Data?

Millennials may be rejecting materialism in favour of travel and experiences, but when it comes to data, make no mistake about it; we live in an age of collecting, storing, and hoarding.

The Internet of Things (IoT) now generates more data than social media, but is it being used? We may be headed for a world where cloud-based artificial intelligence engines can extract stunning insights from the big data we collect from the IoT, but we are nowhere near that just yet. The statistics are baffling. 90% of the world’s data has been created in only the last two years, and 80% of that is unstructured data.

Unstructured data – be in dates, numbers or text, metadata, transactional data and log files – is better known as dark data. It’s collected, lays dormant, and – according to analysts at IDC – 90% of it is never used. Mostly it’s forgotten about completely, so much so that the owners of that data don’t even know it exists. Sometimes it’s collected for compliance reasons; HM Revenue & Customs insist that businesses keep their records for at least five years, for example. Often it consists of confidential information, such as customer account details, and lives on indefinitely on servers, clouds and email accounts.

So dark data is risky, but it’s also a missed opportunity. Businesses are constantly hearing that data is the new currency, and that the only competitive advantage left is in big data. So you can bet that the businesses that shine a light on their dark data will be the ones that do well. Not by mining company emails from a decade ago, or clearing-out old downloads and zip files, but by actually using some of the collected for a purpose.

What about that customer satisfaction survey from last year that everyone forgot about, or that log of all overseas sales from last year that no-one has analysed yet?

Data goes dark because we now collect so much more of it than we can use. We’re on the cusp on the age of the Internet of Things, so sensors – better thought of as ‘data acquisition devices’ – that can collect real-time data on everything are configured to do just that. The cloud, too, has increased in size exponentially in recent years. I recently uploaded 2TB of old photos in a folder marked ‘to sort’ to the cloud to deal with someday; companies are doing the exact same thing. But data collected for the sake of it has no value; it’s only sensor data analysis that brings any benefits.

Dark data is partly an IT and resources issue. Who has the time to go hunting for dark data, with the aim not only of de-archiving it and ordering it, but finding insight from it? Certainly not the IT staff or even those in charge of data analytics, who largely lack content management systems that make it easy to find and use. Data analytical tools are getting more powerful and easier to use, but they still need to be pointed at a coherent data-set.

Conversely, too much concentration on developing the backend IT to support experimental IoT projects is delaying insights that could be valuable. For example, using IoT sensors to predict machine failure, and to pinpoint exactly where company assets are, is a huge opportunity to save millions of dollars. If it’s collected, but going to waste, dark data is as pointless as it is potential gold dust.

Leave a Reply

Your email address will not be published.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>