1. TITLE SLIDE: HEADLINE
Presenter name
Title, Red Hat
Date
For Open Source Disruption,
it's 1994 all over again
Sarangan Rangachari
VP and GM, Storage and Big Data
Red Hat
5. 5
Open source innovation has redefined how enterprises think about data
Democratized the extraction of business value from data
redhat.com/en/insights/big-data
6. 6
Business UserArchitectData Center Operator App Developer
Multiple Silos. Multiple Views. Multiple Goals.
The Old Data Lifecycle
Manage Build Code Query
8. 8
Lack of agile, open, and cost effective enterprise-grade solutions
Barriers to Big Data Innovation
I want more than
canned BI queries
I am locked into a
vendor stack
I want to use my favorite
dev framework
I need to integrate
data across silos
Business User
Architect
Data Center Operator
App Developer
11. 11
Portable, Scalable, On-demand, Measurable
Hybrid architectures, Hybrid deployment models (public, private, hybrid)
Open innovation, Open standards, Open APIs
Open Matters. Choice Matters. Agile Matters.
Seamless transition to the open hybrid cloud
STANDARDIZE VIRTUALIZE REALIZE
Open
Hybrid
Cloud
12. 12
Big Data innovation cannot happen in a bubble
Strong partnerships with industry leaders and open source communities
Twenty years since Red Hat introduced first commercial flavor of Linux
Red Hat revolutionized the IT industry through open source innovation, and has been a leader since
Red Hat is again poised to revolutionize the industry through open source innovation for big data workloads
The success of open source platforms such as Hadoop and Android are testimony to the success of the open source way
Quality and support of open source software has caught up, and in some cases, surpassed, that of closed, proprietary software
Community driven innovation works - compresses the innovation cycle, provides a way for practitioners to influence roadmaps, and evolves with the latest customer requirements and business imperatives
Our view of the world - Volume economics + community driven innovation combine to deliver affordable, best of breed software
Two trends driving seismic changes in IT 1. Unstructured data (that is rich in business insights) is the largest growing component inside of data centers and 2. Hardware advances that have lowered the entry barrier to enterprise grade analytics
Hadoop is a great case in point of how open source innovation had democratized and redefined how businesses mine their data for insights
CTOs and CIOs are quick to point out that each year the Big Data ecosystem chart seems to grow denser, making interoperability and agility top priorities in building an enterprise-grade analytics platform. At last count there were over two million current open source projects accompanied by an 80% increase in open source venture investment
It used to be that only large enterprises with expensive RISC-based servers could afford data analytics. Open source innovation has brought big data analytics to the masses
In fact, in many ways open source innovation has brought about a huge shift in the data life cycle itself
Multiple Silos – Data ingested and processed in separate silos, adding complexity and inefficiency
Multiple Views – No consolidated view of the data. Each stakeholder working off a different data set. Missing out on insights that can be mined by cross pollination
Multiple Goals – Each stakeholder thinks myopically about their department’s goals rather than the goal of the enterprise to monetize data cost effectively. Data center operators are too busy managing infrastructure. Architects are trying to force fit new data types into the environment but end up building new silos in the bargain. App developers create applications that are not capable of mining data in new ways. Business users query the data available to them through a fixed set of queries that do not ask exploratory questions but rather reporting questions.
The business environment is different today – the CIO drives business strategy not just IT strategy. The business looks to IT to unearth the next big idea, next big market, new business models, new ways to cut cost.
Data stakeholders need to speak the same language, work off a consolidated view of the data, and work towards the same goal
Ingest - Data is constantly ingested in many non-standard formats from a variety of data sources. Operators can’t be consumed by house keeping tasks but instead focus on how best to capture new, interesting types of machine and human generated data
Integrate – Architects think about integrating silos rather than building new ones. Use data virtualization to obfuscate complexity so developers can query multiple data sources as one.
Discover – Developers write apps that can enable the rapid discovery of new business insights. Analytics are run iteratively on large sets of unstructured and structured data to distill them down to actionable, insight-rich, queryable data.
Act – Business users doesn’t just query for reporting purposes. They act on the insights available to them. They run ad hoc queries rather than canned ones. They ask exploratory questions about known unknowns and even unknown unknowns because they have been empowered by IT.
When business users can arrive at new business insights in a timely manner, that’s when you know a big data project has been successful. However, there are still many barriers to success.
As with the executives, we hear agility and choice as key imperatives from data stakeholders that create, process or consume data along the lifecycle. Most solutions are either nascent (not enterprise ready, not cloud ready) or too expensive.
While data center operators struggle with shrinking budgets and increasingly volatile workloads, data architects are required to support “rogue” queries that lay additional stress on the infrastructure and require integration across multiple silos, often in real time
Application developers and data scientists complain about the rigidity and lack of choice programming frameworks and development environments.
Business users are constantly demanding exploratory analytics through a self service model as an upgrade over the standard set of queries available through today’s BI systems.
Unfortunately, most vendors offers “band aid” fixes to their outdated technology in order to protect their legacy of software and hardware. Instead, what enterprises really need is a open, agile, and affordable set of modular building blocks that can address their data analytics requirements today, and grow with them as their needs evolve.
Red Hat’s portfolio helps you rapidly develop applications, integrate data sources, and process large volumes of real time data on a secure, flexible, and elastic infrastructure.
Unlike “all or nothing” proprietary technology stacks, you only need to choose the pieces from the picture below that solve your challenges for today, and can build out your solution in a modular fashion to match your growing needs.
There are solutions that address pain points for each stakeholder
Can ingest data from almost any data source in the enterprise
Deployment adds a third and important dimension. Can easily extend to the open hybrid cloud.
Red Hat OpenShift – A Platform as a Service solution that allows developers to use the language, framework, and tools of their choice. Developers can rapidly create new applications, iterate on analytic models, and define their own environments to package up the specific middleware required for a each application.
JBoss Middleware – JBoss Data Virtualization consolidates disparate data sources within and outside the enterprise to create a virtualized data store that can be queried as if it were a real, single data source. This enables developers to focus on application development rather than data housekeeping. It also frees up database administrators to focus on more value added tasks. In addition, JBoss Data Grid, A-MQ, and BRMS can support your most demanding real time workloads, and streamline interaction with rigid and complex data tiers.
Red Hat Storage – A software defined, agile storage platform that integrates file and object storage and Hadoop data services for petabyte-scale enterprise workloads. The recent addition of Ceph Enterprise gives you another leading storage solution especially if you choose to run your workload on an OpenStack infrastructure. We have invested jointly with Hortonworks in a Hadoop plugin to enable in-place analytics on Hadoop data without incurring expensive data movement.
Red Hat OpenStack – An Infrastructure as a Service solution to allow administrators to quickly adopt and embrace new infrastructure components, service providers, and tools. It represents one solution to manage across deployment platforms and technology stacks, including third party infrastructure, thus lowering administration costs.
Red Hat Services & Consulting – Backed with many years of deep expertise in hand holding customers from building concept diagrams to data center deployment, to ensure big data success.