Tuesday 27 March 2012

Curbing [Data] Inflation

We are in the midst of a data deluge according to The Economist. McKinsey are telling us that data volumes are increasing at a rate of 40% per year. Whilst many extol this as a great opportunity, at Bet Buddy we believe this is causing a phenomenon that we have termed Data Inflation i.e. the risk that the value of the data an organisation captures and holds decreases as the supply of data increases.

 
How can this be so? Surely more data means more opportunities to analyse and understand consumer behaviour, therefore more opportunities to drive more personalised and targeted offers and campaigns? In theory this assumption makes sense, however in practice this is very difficult to get right.

The data available to organisations that can be utilised to support marketing, operations and customer services, risk, and compliance activities can be broadly classified as Personal, Machine Generated and Social Network Data (although data privacy laws and policies certainly restrict to what extent we can leverage much of this data). Some data generally falls neatly within these categories, for example we like Splunk’s definitions of machine generated data. However, our categories are open to interpretation e.g. whilst some may categorise click-stream data as machine generated data others argue click-stream records are personal data. Most of the data that is captured within an organisation is, however, never used - 75% remains dormant according to the Financial Times. Whilst this may sound like a lost opportunity, leaving data on the table can also make practical sense.

Knowing how to effectively capture and utilise the right data is the challenge of managing data inflation. Zynga is sending about 5TB of its data to its central store per day, which is about 10% of the data it collects - this covers game actions (personal data) and not log files (machine generated data). We estimate these daily player data volumes are >15x the core player data volumes that a medium sized online gaming operator saves per year (core player data here covers the player, game, session and transaction files). Zynga is clearly a mammoth and generates data on a magnitude alongside the world's largest 'big data' firms. It has over a quarter of a billion monthly active users on Facebook, therefore dwarfing most other gambling and gaming firms. So whilst the infrastructure they have in place is unlikely to be applicable to most gambling firms, they are however a good organisation to examine a little closer. Because of the scale of the data they generate and leverage, they had to invest early in the tools, systems and processes that allow them to better manage the risks of data inflation. As data volumes have continued to increase exponentially, we doubt very much companies such as Zynga relied purely on hiring staff to process and mine increasing data volumes, and they prioritised i.e. they didn't try to analyse everything at the same time. Whilst the magnitude of Zynga's big data challenges do not apply to most gaming and gambling operators, the principles do.

Curbing data inflation requires the organisation to think strategically across a number of key areas e.g. storage, security, transportation, and analytics. We need to step back and start thinking about data within an organisation as an ecosystem of connected data sources, internal and external customers, tools and platforms, processes, and people. Capturing data as part of everyday business-as-usual processes can be very hard, and for data that you prioritise as important for analytics, it means automating daily repetitive tasks and processes, such as:
  • Data sourcing – large gaming operators have multiple game offerings and back office account management systems, supporting multiple channels, usually from multiple vendors. This is a bigger challenge than pure data volumes for most gambling firms
  • Data cleansing – dealing with date/time stamp conversions, replacing commas in numerical values, treating missing data values, etc
  • Data transformation – core player value segmentation calculations, calculating predictor variable data for each predictive model deployed, etc
So organisations need to focus on the data that will drive the most value. Whilst this sounds obvious, we believe that for any single business area or domain, the manager should typically not be presented with more than 15 – 20 data points to manage their business on a daily business. Any more - think of very large and complex dashboards, countless MI reports and tables of data - risks a decrease in productivity and less focus on what really matters to the business i.e. an increase in data inflation. This is important as it informs what data undertakes pre-processing and when, as not every data point needs to be analysed at the same time - some data insights can be utilised after the event whilst others are best applied in real-time. There are other considerations too, such as how and when to undertaken explorative data analysis and how to best visualise data (which is often technical and complex) to enable decisions to be made (here's a good article on visualisation from Jeffrey Heer of Stanford University).

Whilst the value potential of data and analytics is large, the risk of data inflation makes the practicalities of achieving this value very difficult.