I had the opportunity to discuss the topic with two pioneers in the field. Jim Melton is an authoritative expert and writer in the field of database language. With over 25 years of experience, he is the Chair of ISO/IEC JTC 1/SC 32 on data management and interchange. Dr. Klaus-Peter Eckert is a scientific employee at the Fraunhofer Institute for Open Communication Systems. For more than 30 years, he has been involved in projects related to interoperability and distributed application platforms and services.
How do we define big data ? While I have come across several descriptions, here’s the one sentence definition from the book Ethics of Big Data I like to use to get the conversation started : “This aspect of technology is unique in that its very nature (its essence) is to create, connect, correlate, aggregate, store, process, and report on massive amounts of information.”
So in its simplest explanation, big data represents large amounts of complex information to make better-informed business decisions. But as Eckert and Melton explain, big data has its own entrenched challenges that need to be addressed before it can have real impact.
Of blind men and elephants
“Until very recently, the global IT community has been looking at big data in the same way that the six blind men in the fable inspected the elephant,” says Melton. That is, each member of the community considered the subject (big data) from only one perspective, at most a handful. But big data, like the elephant, is more than the conclusions derived for all of those perspectives combined. It is considerably more complex when contemplated as a whole than as the sum of its parts’ complexities.”
The window of opportunity is wide open.
The amount of data being produced by commercial enterprises, such as large corporations, is staggering. According to a McKinsey Global Institute Study on big data, the projected growth in global data generated per year is 40 %. The problem is no longer acquiring data, however, but what to do with it.
“Big data provides many challenges,” says Melton. “Retention and data quality are only two of them, and not necessarily the most difficult. Processing that data, querying it, analysing it and summarizing it are going to be quite difficult. In many environments, simply describing the data – developing metadata for it – will be vitally important and very difficult to do.”
Eckert believes that the window of opportunity is wide open – from a technical point of view. Unfortunately, convergence of the different approaches and interoperability between the available tools is still lacking.
“We have several tools around big data and data analytics that have been developed by different communities during the last years.
We have different types of IT infrastructures, especially in cloud computing, that have been developed independently of big data. Thus, all these technical building blocks are available but they don’t fit together. Interoperability is missing. There is the lack of a widely agreed architecture that pulls together the many components of big data solutions. And this is exactly where standards efforts come into the game.”
Both Melton and Eckert believe the complexity of big data and the interactions amongst many domains require further analysis. This is where the ISO/IEC JTC 1 Study Group comes in.
Mapping the complexities
The development of standards into this maze of big data complexities will create a platform for interoperability.
The recently created Study Group is expected to give a much clearer picture of the entire scope of the problems, including serious ethical issues such as whether certain uses of big data violate fundamental civil, social, political and legal rights. The end result will be one or more recommendations to JTC 1 concerning the allocation of projects to its subcommittees, at which time formal standards development will begin in earnest.
So, how long until we see the results ? There is an old project dilemma that suggests you can’t have it fast, good and cheap – you can only pick two. That trade-off is also true of big data. The Study Group will need to produce a truly meaningful and useful understanding of the problem space to do their work quickly, and simplify that space to reduce the number of eventual standards that will be developed.
“The great unknown is what can practically be standardized, and only when the optimum time for such standardization occurs,” says Eckert. “There are great risks in premature standardization because the technologies will not have been adequately studied and understood, but there are also risks in late standardization, after one pre-eminent player has staked out a commanding position in the market. Clearly, JTC 1 and its subsidiary committees will have to prepare themselves well and watch carefully as the technologies develop.”
Over time, analysts predict that big data may well become a new type of corporate asset, cutting across business units, and function much as a powerful brand does, representing a key basis for competition. If that’s the case, companies need to start thinking about whether they are organized to exploit big data’s potential and to manage the threats it can pose.
“This is a very exciting time to be involved in IT standardization,” says Melton. “I truly believe that addressing the problems, challenges and opportunities associated with big data can create a paradigm shift.”
Melton’s prediction may come sooner than expected. Investments in 2013 continue to rise, with 64 % of organizations investing or planning to invest in big data technology compared with 58 % in 2012, according to a survey by Gartner, Inc.
One thing is clear : success will demand not only new skills and strategies but also new standardization perspectives on how the era of big data could evolve. Over time, the development of standards into this maze of big data complexities will create a platform for interoperability, giving companies the ability to analyse and leverage this data to better serve their markets. So what are you waiting for ? Let’s get to it.