This presentation will explore how Hadoop and Big Data are re-inventing enterprise workflows, and the pivotal role of the Data Analyst. It will examine the changing face of analytics and the streamlining of iterative queries through evolved user interfaces. The speaker will cut through hype around “shorter time to insight” and explain how combining Hadoop and SQL-based analytics help companies discover emergent trends hidden in unstructured data, without having to retrain data miners or restaff. In particular, it will highlight changes to Big Data analysis from this paradigm and illustrate stepwise how analysts can now connect to Big Data platforms, assemble working data sets from disparate sources, analyze and mine that data for actionable insight, publish the results as visualizations and for feeding reporting tools, and operationalize Map-Reduce and Big Data outcomes into company workflows – all without touching the command line.
I’m going to talk about things we’re seeingHow forces of change are impacting how IT operates, the growing role of data and how data professionals are moving front and center to play major roles in empowering businesses
I’m going to cover 4 main themesFirst I’ll talk about the traditional requirements-driven businessThen I’ll summarize what we see as the major factors driving change and opportunities for us allThe bulk of the presentation is about how all of us are becoming agents of change and how to meet the needs of our roles in this new world of Hadoop-enabled Big Data.
So let’s take a look at the traditional business and, in particular, how it deals with data….The result of all this is that business insight is limited – in scope and time.
The forces of change are not just about technology …
From working with thousands of users, customers and partners, we’re seeing a blue print emerge for the new data centric business. It’s about enablement. In particular, it’s about the utilization of all a business’s data and enablement of data professionals and analysts.And useful data is not just limited to what a business currently owns. Data marketplaces, aggregators and specailist providers across many industries are opening up their data, providing APIs and creating the promise of even more business and market-relevant insight.
Listening to yesterday’s keynotes, much of what Larry Feinsmith said rings true and is aligned with what we see and hear, not just from financial services but across multiple verticals.Successful IT organizations are enabling data professionals to be self-service. Whether it’s through on-premise or, increasingly, on-demand in-the-cloud services, this has to be the mantra of the successful future business.
And data professionals are at the heart of this change. They can choose to be concrete or catalyst.We are all innovators. Innovation entails risk and many of us sometimes baulk at it. But our roles in this new industry of data are growing, our potential to impact our businesses is growing and because of the value we bring and simple supply/demand economics, we will get paid more.
I’ve been at all 3 Hadoop World events. It’s interesting to reflect on how people thought about Hadoop two and even one year ago.
But we’re all becoming and need to be more sophisticated. Installing Hadoop is only step 1. A devops team bringing up a CDH cluster or an inspired developer firing up a Hadoop cluster using Elastic MapReduce, is just the beginning. Businesses are getting smarter about understanding the potential of Hadoop but also about how to plan for success and what a successful Hadoop-based stack looks like. And about who and how they enable skilled workers to access that stack.
What we see are 3 key classes of data professionalsIT is clearly key to the infrastructure. But choosing your Hadoop provider and determining whether you’re going on-premise, in-cloud or with a hybrid strategy is just step 1.Businesses are getting smarter
More sophisticated thinking now takes into account the democratization of access. Here’s this is the common fabric we’re seeing.
Hadoop open source projects also fall into these categories with the data management projects focused on innovation of the core platform and the analytics projects creating the core technology for analysis.
Data engineers often implement existing algorithms in MapReduce or take the insights created by data analysts.They also build distributed functions that the analysts can use.
So let’s take a look at what we’ve found about what the data analysts and engineers need. It’s not just about the command line any more. As you grow the teams of professionals accessing Hadoop, it’s no longer enough to give them a command line, SSH or rudimentary web interface. People have skills and skills flourish faster in high productivity environments
So what does a workflow optimized for big data look like?We think it needs to provide 4 key workflow stages.It has to enable you to connect to any Hadoop cluster, no matter where it is located and which company or organization it comes fromIt needs to provide easy access to data so you can point, click and automatically understand the data as it’s prepared for analysisMost importantly, it needs to provide an easy-to-use environment for iterative analysis with abstraction and visualization capabilitiesFinally, it needs to provide the ability to act on any and every insight generated. I’ll walk you through all of these…