There’s a tremendous amount of noise in the market around “big data” but what’s the value in collecting data about everything all the time? Having the data, is after-all, not really of any business benefit unless you know what to do with it in a timely enough fashion to actually impact your business.
Think of some simple examples you may have seen recently. Have you gone into a shop only to find out an advertised product was unavailable? Surely you think, this business tracks its sales, inventory, shipments, ad-responses and then uses that information to improve efficiencies. Unfortunately, the fact that this data isn’t available or analysed until days or weeks after the collection means that you still left the shop empty-handed. If the retailer was able to see these trends in real-time, perhaps they would have been able to detect the spike in queries for the product, sense the low inventory levels, and dispatched an on-demand shipment to replenish stocks!
The value in big data, is not in the data, it’s in what you can do with it.
If your business is considering investing in big data, it’s worthwhile contemplating first the relevancy of that data to your success. In an ever-changing global marketplace, the usefulness of data can often change dramatically over a very short period of time.
Big data is defined by four characteristics – the four “V’s”: volume, velocity, veracity and value. Many of the big data projects thus far have focused purely on gathering the largest volume with the highest velocity. Placing insufficient emphasis on the veracity and value often means that the results you’ve gleaned from the data amount to little more than noise, and aren’t of real commercial value to your business.
Smart data focuses less on the volume and velocity and much more on the veracity and value, filtering out noise and bulk to drill down to those nuggets of data gold that will really revolutionise your business. Having a lower volume, and gathering it at lower velocity also means that it’s easier to analyse quickly, so that you can react quickly. Still searching all sources, filtering the data you gather and analysing it promptly is not enough.
To be genuinely useful, data needs to be understood in context. Looking at how long the average visitor stays on a webpage for example is likely not a very useful piece of data on its own. We know, for example, that the average page-view is around 59 seconds. We could judge the success of a given landing page by how it performs against this magic number, and likely understand… nothing. If on the other hand we are looking at this time in the context of past performance of similar pages and taking into account current marketing activities, the traffic source and the users next hop we could begin to understand what’s really happening. We’d have achieved some value; and if we knew that the users who went to the page all did so intentionally, and found what they were expecting to find there, we’d begin to have some veracity. Then we can begin to ask some questions. Did the users leave quickly because they didn’t like the page? Because it had too much information? Too little information?
The focus cannot be on the collection of the data; it has to be on contextualising the data. The real challenge isn’t finding better ways to manage your data, it’s finding better ways to learn from it!
If your big-data project is focused around gathering more, storing it quicker, map-reducing it faster, you may have missed the point. It’s not that big-data isn’t useful, it’s that it’s not necessarily going to tell you anything useful. Looking for the fastest provider with the most storage isn’t necessarily going to help.
Approaching your data science intelligently means you’re focused on smart data. Looking deeper into what your business objectives are, then deciding whether you gather more data that could be used intelligently to improve your business or whether the answer is simply to work with the data you already have in better and more intelligent ways. Having the data gathered from many different sources, and correlated by context often is the first step to improving veracity and value, and that means working with a provider that understands your whole business rather than focusing on individual systems.