In this webinar, we’ll show you how Cloudera SDX reduces the complexity in your data management environment and lets you deliver diverse analytics with consistent security, governance, and lifecycle management against a shared data catalog.
20. Of course! We have our internal EDH cluster. That
would be easy!
With increased focus on … business
insights.. dashboard … FAST...
Charles,
SVP, Emerging Businesses
Mulyadi,
Data Scientist
21. Pipelines! Workloads! Queries! More
pipelines. More workloads! More queries!
Even more….
Mulyadi,
Data Scientist
Alan,
Internal EDH Data Platform
Manager
Adding more workloads to Internal EDH clusters is
risky and adds uncertainty to existing SLA-sensitive
workloads.
Multi-disciplinary analytics: vehicle to go from insight to action.
It is the foundation for all your analytics innovation
Determines how successfully you can meet both your business & operational goals
Will enable you to ask bigger questions and solve complex analytics problems
Everyone is doing it – experience with our customers
We’ve previously done webinars on just that topic, so please reach out if you’d like to receive a link to a replay for those
Requires a platform that:
Support multi-function analytics
Minimize time to add workloads
Support elastic workloads
Enable self-service
Provide a scalable model for sharing data
Reduce cost
Increase tenant isolation
Secure the environment
We’ve all been doing analytics for a very long time, for a variety of reasons and in a range of departments. So we’ve accumulated some baggage
We’ve gathered typical legacy systems. Each own application does its own thing, has it’s own data storage and also its own context.
Many data silos, each requiring its own proprietary tools and infrastructure. Each application or workload has dedicated compute, data storage and context
Different vendors, products, and services. Throughout a project, they’re all needed and in different ratios. You start off with
ETL/DE
DW/M
Analytics in SQL
NoSQL/RT
DS
Each in it’s own siloed application. And you may have multiple of each kind, of each workload.
A fragmented approach is difficult, expensive, and risky. Managing and guaranteeing security/compliance etc across the board is hard. Nothing’s shared. Context is reestablished each and every time. New application or workload, new context. Whole disciplines grown around trying to keep this all in sync
A struggle or, for the Disney fans, a tale as old as time between business and IT., More beast than beauty though
Different problems for different users.
IT
Each application or workload is single use, data sources are inflexible
See lots of redundancy of data, inefficient operating environment
Users
Can’t find data they need. Waiting on IT
Finding out of date data
Doing prep, not finding insights
Head of data and analytics
Administrative, not innovating – choosing databases rather than operationalizing
Users resort to shadow IT to work around their challenges
Organisation vs business unit
So then use a big data platform on premises. A single physical cluster. Everything in one place so the shared data experience to multiple workloads and tenants is given
Strong
multi-function support
Shared data experience
Information security model
Ok-ish, moderate
Cost management
Tenant isolation
Workload elasticity
And it’s ok to be on premises, even though
Weak
Self service
Speed of deployment
To address those two challenges, organisations are branching out and looking at alternatives to meet the business needs in those areas. Time to insight and action needs to be forever shorter: self service. New capacity is needed in the next 10m and the need is gone by tomorrow. That sounds like cloud, right?
Well, certainly right on those fronts. Cloud deployments are strong where on-premises is weak:
Strong
Tenant isolation
Workload elasticity
Self-service
Speed of deployment
But if you haven’t carefully looked at your cloud strategy, you can find yourself re-introducing context silos.
Weak
Shared data experience
Information security model
It’s compounded by the fact we’re now dealing with a mix of permanent and transient workloads. Back to square one in terms of challenges. Biggest ones around: security and governance
Irrespective of what audience, the impact on business is detrimental
The crux is the data context. To have a workload, you need data, compute power AND context
Compute and data are becoming further separated
Compute is stateless: cloud-based or on-prem, either transient or long-running
Data is stateful: cloud-based or on-prem in HDFS, Kudu, S3, ADLS, Isilon, etc.
What about data context?
Schema Definitions
Permissions
Encryption Keys
Governance
Data context should be stateful, but currently is stateless
This creates synchronization and usability challenges for admins and end users alike
So we introduced Cloudera SDX - or shared data experience – the foundations of Cloudera Enterprise.
SDX makes it possible for companies to run dozens - hundreds - of analytic applications against a common pool of data. One logical cluster provides a shared data experience to multiple workloads and tenants
SDX applies a centralized, consistent framework for catalog, security, governance, management, data ingest and more.
It makes it faster, easier, and safer for organizations, teams, people to develop and deploy high-value, multi-function use cases like customer next best offer, clinical prediction, and risk modeling.
SDX cuts through silos to unify data, analytics, management, security, and governance, and empowers self-service
It combines the strengths of on-premises and cloud only deployments:
* multi-function support
* shared data experience
* information security model
* cost management
* tenant isolation
* workload elasticity
* self service
* speed of deployment
SDX is a set of open platform services built for multi-functional or multi-disciplinary analytics that have been optimized for the cloud. This means
a shared catalog that helps to define and preserve the structure and the business context of all your data, regardless of where it happens to reside.
that we offer a unified security model that helps protect sensitive data with a consistent set of controls
that we offer a consistent governance model that enables self-service secure access to all of your relevant data. Not just one type of data, really to all of it, increasing your ability to be compliant, particularly in a regulatory environment.
Next, flexible data ingest and replication. We have a number of core partners that we work with in this arena that help you aggregate a single copy of all of your data, providing you easier debt disaster recovery and that eases migration of data from one place to another.
Finally, easy workload management that increases user productivity and boosts job predictability.
So, SDX is really a core piece of how we at Cloudera separate ourselves from the competition.
Couple of examples how SDX helps. The list is endless
Satosh
Santosh
Santosh
Speak about over time this will all become one thing
Santosh
Prefix with a couple of slides illustrating the challenge without SDX
Impact of consistency on the journey
Enables
Simplifies
Example of large European financial org:
Reduce fraud, improve accuracy from 64 to 98% and 80% faster customer service
Cluster spin up with full context and deployed to three different clouds in <1h
Three admin vs 15
Concept to production in 6 months