A Decade On: The Evolution of Hadoop at Age 10

December 22, 2014
Scott Etkin
http://data-informed.com/decade-evolution-hadoop-age-10/

Apache Hadoop turns 10 in 2015. What started as an open-source project intended to enable Yahoo! Internet searches has become, in a relatively short time, the de facto architecture for today’s big data environments.

As big data exploded in 2014, Hadoop adoption and investment expanded along with it. Today, Hadoop is deployed across industries including advertising, retail, healthcare, social media, manufacturing, telecommunications, and government. But it won’t be long before companies begin demanding to see a return on their Hadoop investments.

“Hadoop has been rapidly adopted as the way to execute any go-forward data strategy,” said Gary Nakamura, CEO of Concurrent, Inc. “However, early adopters must now show return on investment, whether its migrating workloads from legacy systems or new data applications. Luckily, products and tools are evolving to keep pace with the trajectory of Hadoop.”

Indeed, Hadoop experts see the platform continuing to evolve and grow in 2015.

MapR Technologies CEO and co-founder John Schroeder predicts that, in 2015, new Hadoop business models will evolve and others will exit the market.

“We are now 20 years into open-source software adoption that has provided tremendous value to the market,” said Schroder. “The technology lifecycle begins with innovation and the creation of highly differentiated products, and ends when products are eventually commoditized.

“Hadoop adoption globally and at scale is far beyond any other data platform just 10 years after initial concept,” he added. “In 2015, we’ll see the continued evolution of a new, more nuanced model of open-source software to combine deep innovation with community development. The open-source community is paramount for establishing standards and consensus. Competition is the accelerant transforming Hadoop from what started as a batch analytics processor to a full-featured data platform.”

Steve Wooledge, Vice President of Product Marketing at MapR, said he sees Hadoop-based data lakes and data hubs becoming the norm in enterprise data architectures in 2015, and self-service data exploration going mainstream.

“Hadoop as a data hub or data lake is a very standard and introductory use case for most organizations,” said Wooledge. “Companies are not sure what value there may be in untapped data sources, such as machine logs from the data center, social media, or mobile interactional data, but they want to harness the data and look for new insights, which they can inject into business processes and operationalize.

Schroeder agreed.

“In 2015, data lakes will evolve as organizations move from batch to real-time processing and integrate file-based, Hadoop, and database engines into their large-scale processing platforms. In other words, it’s not about large-scale storage in a data lake to support bigger queries and reports. The big trend in 2015 will be around the continuous access and processing of events and data in real time to gain constant awareness and take immediate action.”

Ron Bodkin, founder and CEO of Think Big Analytics, said Hadoop will outgrow MapReduce in the coming year and Spark will grow in importance.

“One of the first things that we can expect from 2015 is that Hadoop clusters will start to benefit from other programming models besides MapReduce to deal with large data sets,” he said. “We already saw YARN begin to gain momentum in 2014 when it got across-the-board support from distribution providers like Cloudera as well as Hortonworks. Expect that this investment will begin to pay off in 2015 as more customers start leveraging YARN’s ability to support alternative execution engines, such as Apache Spark.”

Now that Hadoop has matured and gained widespread adoption, Bodkin said that the coming year could see late adopters finally feeling bold enough to embrace Hadoop.

“Hadoop has long since broken free of its web giant and ad tech heritage, penetrating most industries – notably music as streaming became ubiquitous,” said Bodkin. “In 2015, even late adopters will turn their attention to Hadoop, so expect an uptick in cost-driven implementations around better storage and faster load-times: SAN/NAS augmentation, ETL offload, and mainframe conversions.”

Monte Zweben, co-founder and CEO of Splice Machine, sees Hadoop evolving in the direction of concurrent applications in 2015.

“Concurrent Hadoop-based applications will become more prevalent in 2015 because of their ability to access real-time data and process transactions like a traditional RDBMS,” he added. “Emerging technologies that allow concurrent transactions on Hadoop enable data scientists and applications to work with more recent and accurate information instead of data that is hours or days old from batch processing. This is a major step in Hadoop’s ongoing evolution to meet the needs of businesses with mission-critical database applications that are having trouble cost-effectively scaling to meet higher data volumes.”“Big data has bloomed in 2014 as enterprises have invested in platforms like Hadoop. As we enter 2015, getting more out of those initial big data investments will grow as a top priority for businesses,” Zweben said. “Increased competitive pressures and the current appetite for real-time information no longer allows for the old model of waiting for data scientists to take hours or days to generate insights based on out-of-date information. New developments in the Hadoop platform can power applications that can act on insights now, instead of later, and with more recent data.