What Is The Future of Big Data? | Straighttalk

Newsletter Subscription

Keep up with new content on the site, receive exclusive content and commentary, and learn about activities within the Straight Talk community.

By Bernard Marr, CEO, Advanced Performance Institute

This article is by Featured Blogger Bernard Marr from his LinkedIn page.

I have had the pleasure to speak to Mike Olson, one of the founders of Cloudera, to explore the future of big data.

Cloudera was established in 2008, when few had heard of the term “Big Data”, and has gone on to establish itself as a driving force in the field. Not only does it provide the Open Source technology which underpins many of today’s most demanding and ground breaking analytics projects. It also invests heavily in the development of new tools and applications which are opening up access to technologies such as machine learning, real time analytics, and more efficient use of unstructured data to a bigger than ever potential user base.

After leaving Oracle, Olson worked on developing open source database software before teaming up with former Yahoo, Google and Facebook engineers who had previous experience with Hadoop.

In 2009 their company, Cloudera, became the first commercial vendor of Hadoop, facilitating an explosion in the use of Big Data analytics in industry. Hadoop offered affordable access to large scale distributed storage and the fundamental technologies such as MapReduce, necessary for what we today call Big Data projects.

Olson tells me “When we started in 2008 no one was talking about Big Data at all. The only people who knew about Hadoop were Java programmers working for Facebook or Yahoo.

“So in the early days we had to be super-evangelical. Why does data matter? Why do we need so much of it, and why is this platform the right approach?”

Fast forward just three or four years and this is no longer the case – every analyst is declaring that Big Data is the tool which will redefine business and the strange sounding word “Hadoop” is on the tip of every tongue in the tech industry.

However, it still existed primarily as two somewhat complex components – the HDFS file system which allows huge amounts of data to be spread across vast volumes of cheap, off-the-shelf storage components. And the MapReduce framework which enables that data to be retrieved and processed.

“You could land the data in one place,” Olson tells me, “and you could get at it with obscure tools like MapReduce, but you had to write the tools to do it.

“What’s happened in the last few years is an explosion – not just of vendors of the platform, companies like ours – but also a rich ecosystem of other companies innovating in the space, adding value and also competing to drive real value for the customer.”

Undoubtedly it was that ecosystem – further Open Source developments such as HBase, Spark and Impala (created by Cloudera) - which has driven the opportunities we are seeing today with Big Data. No longer purely the domain of those trained in statistics and computer science, Big Data is put to work in the medical field to create new treatments and cures, in financial services to prevent fraudulent transactions, and by humanitarian organizations to deal with the results of war and natural disasters.

“You know, if you’re a software guy … you don’t usually get to work on stuff like this,” Olson says. “What’s happening is that Big Data is giving us avenues for improving medical care that were never there before. There are novel projects underway in every single industry applying data to meaningful problems.

“Hey, look, we will absolutely let advertisers target their ads better on the internet – that’s going to happen. We’ll let retailers make better offers and engage with their customers better than ever before, and those are perfectly reasonable and good things to do.

“In addition, though, Big Data is allowing us to work on a lot of really meaningful social and economic problems. And it’s a thrill to be able to do that. Look – I’ve been in the database industry for 25 years and I don’t get to tell these stories about the early part of my career. It’s only now, with the advent of Big Data, that we are talking about stuff like this.”

As an example of a project of which he is particularly proud that the technology he has helped to pioneer has enabled, Olson points towards the Cerner Corporation’s work with predictive analytics based on patient’s medical records, as well as his personal involvement with the Precision Medicine Initiative.

New, innovative tools will also, in part, be the answer to another problem facing the analytics field – the growing gap between industry’s (and the world’s) need for data analytics, and the number of people trained to carry it out.

Olson says “Give me credit for being a crotchety old guy back from 2008, when we started up – but back then Hadoop was a mystery to the whole planet. Now, it’s a pretty well understood technology.

“We’ve made a lot of process in addressing the skills gap – we’ve trained tens of thousands of people, and we’ll continue to do that.

“But it’s not like we’re going to train every human on the planet how to ‘do Big Data’. What’s going to happen is we’re going to solve the problem with software. We want researchers to be able to fire up the Big Data predictive dashboard, find the right molecule to treat the specific disease a patient has, and what’s going on under the cover is of no consequence to the person doing the work. We’ve got some mileage left to cover, but that’s how we’ll solve the skills crisis.”

So, if everyone can be a data scientist soon - will there still be a need for companies like his?

“Part of what we do is looking into the future and understanding where the data management platform is headed,” Olson answers. “It’s complicated, and if you’re in the business of running a financial services company or developing drugs, you’ve got a full time job. Spending days scanning the horizon for potentially disruptive technology may be fun but it’s not what you’re paid to do.

“It [a Big Data platform] has got to be easy to install and operate. You’ve got to be able to hit service level agreements with your users, you must be able to pass a rigorous security audit. This is stuff a specialist vendor, like us, knows how to do, and that’s why we’re growing so quickly.”

Finally, I asked Olson how he thought the attitude of the general public towards Big Data was likely to change. Are individual citizens likely to increasingly see the benefits of allowing corporations or Governments access to data on their private lives, in return for ever increasing levels of comfort, convenience and safety? Or is a growing distrust going to end with individuals attempting to claw back as much of their privacy as possible? I’ve often thought we could be just one large scale, truly devastating hack away from a widespread U-turn in public opinion. Imagine if Facebook or Google was hacked, and everyone’s conversations and internet history was made public for the world to see – would anyone ever trust Big Data, ever again?

“I take that threat very seriously”, Olson replies. “The reason we’ve invested so much in this space is because of the damage that could be done by poorly secured infrastructure. We’ve got to build the mechanisms that protect the data.

“But besides that there’s an ethical obligation on the creators and users of that data – the organizations that collect and analyze it – to do so responsibly.

“We are in the middle of that ethical dialogue right now, but I don’t think we are there yet. It’s a discussion that is frankly young and its one that I’d like to see happening more aggressively. If we do those things then I think we will be okay. Those of us who are concerned about these issues must drive the community at large to a responsible position, and that’s a clear focus for me, personally.”

So, clearly Olson sees the need for ongoing commitment to developing new tools and continuing to innovate, while making advanced analytics increasingly accessible to as many people as possible.

In addition, he sees a need to be equally enthusiastic in encouraging dialogue between the development community, industry, governments and the public at large, regarding who should have access to data, and who shouldn’t. Both of these are certainly necessary if Big Data is going to keep the promises it has made to the world.

Originally published on LinkedIn