Learn about:
Why big data matters to your business: realize revenue, increase customer loyalty, and pinpoint effective strategies
The business and technical challenges of big data solutions
How to leverage big data for competitive advantage
The “must haves” of an effective big data solution
Real-world examples of Cloudera, Pentaho and Dell big data solutions in action
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Webinar | Using Hadoop Analytics to Gain a Big Data Advantage
1. Using Hadoop Analytics to
Gain a Big Data Advantage
Jonathan Seidman, Solution Architect, Cloudera
Ian Fyfe, VP Product Marketing, Pentaho
Jeff Stacey, Director of GTM Strategy, Channel & Sales Development, Dell
2. Why big data
matters to your
business
Jonathan
Seidman, Cloudera
2 Confidential Big Data Solutions
2
3. Explosive Data Growth
10,000
GIGABYTES OF DATA CREATED (IN BILLIONS)
1.8 trillion gigabytes of data was
created in 2011…
• More than 90% is unstructured data
• Approx. 500 quadrillion files
5,000 • Quantity doubles every 2 years
0
2005 2010 2015
STRUCTURED DATA UNSTRUCTURED DATA
Source: IDC 2011
3 Confidential Big Data Solutions
4. The ‗Big Data‘ Phenomenon
Big Data Drivers More Content More Devices
• The proliferation of data capture
and creation technologies
• Increased ―interconnectedness‖
drives consumption (creating
more data) More New & Better
Consumption Information
• Inexpensive storage makes it
possible to keep more, longer
• Innovative software and analysis
tools turn data into information
• Every gigabyte of stored content can generate
Big Data encompasses not a petabyte or more of transient data*
only the content itself, but
how it’s consumed • The information about you is much greater
than the information you create
*Source: IDC 2011
4 Confidential Big Data Solutions
5. The Opportunity: Quickly gain a competitive
advantage
Use Cases
• Big opportunity to drive • Ecommerce – Predict
revenue, e.g. customer behavior across
– Predict customer behavior all channels to drive
across all channels (Web revenue
site, social media, email, etc.) • E-gaming – understand
– Understand and monetize and better monetize
customer behavior customer behavior
– Predict customer churn • Networks – predict failure,
neutralize attacks to reduce
• Big opportunity to reduce costs
costs, e.g. • Customers – predict churn,
– Networks – predict optimize revenue
failure, neutralize attacks • Machines/sensors –
– Machines/sensors – predict predict failures, reduce
failures costs
– Financial risk management – • Financial risk
reduce fraud, increase security management – reduce
fraud, increase security
5 Confidential Big Data Solutions
6. Big data
challenges
Ian Fyfe, Pentaho
6 Confidential Big Data Solutions
6
7. Big Data Challenges
Cost-effectively managing the volume, velocity and variety of
data
Deriving value across structured and unstructured data
Adapting to context changes and integrating new data sources
and types
7 Confidential Big Data Solutions
8. The Current Solutions
10,000
GIGABYTES OF DATA CREATED (IN BILLIONS)
Current Database Solutions are designed
for structured data.
• Optimized to answer known questions
quickly
5,000 • Schemas dictate form/context
• Difficult to adapt to new data types and new
questions
• Expensive at petabyte scale
0 10%
2005 2010 2015
STRUCTURED DATA UNSTRUCTURED DATA
8 Confidential Big Data Solutions
9. Common Data Analytics Architecture
Offline data can‘t be
analyzed easily
TAPE
ARCHIVE Can‘t explore original
BI REPORTS
&
high fidelity data
INTERACTIVE APPS
STORAGE ONLY RDBMS
GRID ETL COMPUTE
GRID (AGGREGATED
(ORIGINAL RAW DATA) DATA)
Moving data to compute
doesn‘t scale
DATA COLLECTION
DATA SOURCES
9 Confidential Big Data Solutions
10. Leveraging big
data for
competitive
advantage
All
10 Confidential Big Data Solutions
12. Big Data Analytics at TravelTainment
Multi-channel distribution platform for the travel industry
Pentaho Business Analytics fits perfectly
into our open source Big Data
environment.‖
-- Ibrahim Husseini, Director of Data
Warehouse, TravelTainment
• Business challenge: Inefficient and time consuming reporting capabilities on big data
sets with legacy system.
Benefits
Why Pentaho
• Ability to visualize its very large data volumes for reporting and
analysis in such a way that non-technical users can also easily • Capability to analyze data from Hadoop
understand them and Hive
• Professional support for in-depth
• Can now run complex reports three times faster and with more
analysis
flexibility than before
• Self-service analysis and reporting for
• For the first time can offer clients user-friendly, self-service and business customers
ad-hoc reporting services helping IT focus on their main business
and not serve as support desk for reporting • Cost effective solution
12 Confidential Big Data Solutions
1
13. Dell uncovers new insights and reduces IT costs by US$35 million with a
business intelligence solution designed for big data
Accelerated customer shipment time by 33 percent
Dell Saved US$2 million by improving product quality
Business
Integrated data silos
Intelligence
Practice Reduced IT costs by US$35 million
Increased agility
13 Confidential Big Data Solutions
14. SecureWorks slashes
the cost of storage Organization
with Dell | Cloudera SecureWorks is a true security partner
to help protect your IT assets, comply
Solution with regulations and reduce costs —
without having to build your internal
security expertise from scratch.
Challenge
SecureWorks needed a highly scalable solution for
collecting, processing, and analyzing massive amounts of data collected from
customer environments. “Our storage cost per
gigabyte is 23 cents.
Solution We thought we had
The organization deployed the Dell™ | Cloudera® Solution with Cloudera‘s great economics
distribution of Apache® Hadoop® software, Dell-developed Crowbar software previously when we
framework, PowerEdge™ C2100 servers, Force10 switches, Dell and were spending about
Cloudera services in a solution based on a Dell reference architecture.
seventeen dollars per
gigabyte.”
Benefits
• Reduced the cost of data storage to 23 cents per/gigabyte
• Gained easy scalability for future growth Robert Scudiere, Director of
Engineering, Dell SecureWorks
• Leveraged open source software and commodity hardware to reduce time
to market
• Maintain high availability for critical services and flexibility to analyze
structured and unstructured data
Read the case study
Watch the case study video
14 Confidential Big Data Solutions
15. Must-haves of an
effective
big data solution
Jeff Stacey, Dell
16, then 24 to close
& Jonathan Seidman,
Cloudera
15 Confidential Big Data Solutions
1
16. Big Data Solution Requirements
Cost-effectively manage
the volume, variety and velocity of data
Process and analyze
large, complex data sets…quickly
Flexibly adapt
to context changes and new data types
16 Confidential Big Data Solutions
17. Why was Hadoop created?
Dramatic changes in
Exploding data volumes & types LEADS TO
enterprise data management
With Hadoop, you can…
• Extract more value
DIGITAL
CONTENT • From more data
• More cost effectively
NEW • With greater flexibility
OPERATIONAL
OPPORTUNITY
WEB DATA
LOGS
SOCIAL
MEDIA • Deep analysis
FILES SMART
GRIDS
• Exhaustive and detailed
HARD • Sophisticated algorithms
PROBLEMS • Quick results
TRANSACTIONAL
DATA
AD
IMPRESSIONS • Any kind
R&D • From any source
DATA
• Structured and unstructured
BIG DATA • At scale
It’s difficult to handle data this diverse at this scale.
Traditional platforms can’t keep pace.
17 Confidential Big Data Solutions
18. What is Apache Hadoop?
CORE HADOOP COMPONENTS
Hadoop is a platform for data
storage and processing that is… Hadoop
Distributed File
MapReduce
Scalable System (HDFS)
Fault tolerant File sharing and data Distributed computing
Open source protection across physical across physical servers
servers
Consolidates Excels at Scales
everything complex analysis economically
• Scale-out architecture divides • Can be deployed on
• A single repository for storing
workloads across multiple commodity hardware
and mining any type of data
nodes
• Not bound by a single schema • Open source platform
• Flexible file system eliminates guards against vendor
ETL bottlenecks lock
18 Confidential Big Data Solutions
19. Core Hadoop: HDFS
Self-healing, high bandwidth CLUSTERED STORAGE
1
2
HDFS
3 2 1
1 1 2
4 4 3
5 3 3
5 5 4
2 4 5
HDFS breaks incoming files into blocks and stores them redundantly across the cluster
19 Confidential Big Data Solutions
20. Core Hadoop: MapReduce
Framework for DISTRIBUTED COMPUTING
1
2
MR
3 2 1
1 1 2
4 4 3
5 3 3
5 5 4
2 4 5
Processes many jobs in parallel across many nodes and combines the results
20 Confidential Big Data Solutions
21. Major Hadoop Utilities
Apache Pig
High-level language
for expressing data
Apache Hive analysis programs
Apache HBase
SQL-like language and
metadata repository The Hadoop database.
Random, real -time
read/write access
Hue
Apache Zookeeper
Browser-based
desktop interface for Highly reliable
interacting with distributed
Hadoop coordination service
Oozie
Flume
Server-based
workflow engine for Distributed service for
Hadoop activities collecting and
aggregating log and
event data
Sqoop
Apache Whirr
Integrating Hadoop
with RDBMS Library for running
Hadoop in the cloud
21 Confidential Big Data Solutions
23. The unrivaled leader in Hadoop
• Worldwide #1 distribution of Apache
Hadoop
• 100% Open-Source Hadoop
Distribution
• Largest contributor to the open source
Hadoop ecosystem
– Project founders from 8 of the 13
leading Apache Projects
• Cloudera has more Apache committers
on staff than any other company
• More than 100 enterprise & public
sector customers across a wide variety
of industries
23 Confidential Big Data Solutions
24. Dell | Cloudera Solution with Pentaho
Dell Value
Business intelligence practice
Open & scalable infrastructure
Certified and tested platforms
Active community participation
Crowbar deployment tool
Reference Architecture
Deployment Guide & Services
Joint support with Cloudera
Actual customers
24 Confidential Big Data Solutions
25. Industry first: PowerEdge C8000
Mix and match for the ultimate performance in a dense 4U package
• Speed up your most resource-intensive
workloads by mixing and matching
compute, storage and/or GPU nodes in the
same 4U shared infrastructure chassis
• Get the cores, memory and I/O expansion you
need for peak workload performance
Great for: Big Data, Web 2.0/Hosting, HPC
Get faster results with
Mix & Match Do more with less
more compute power
• Mix compute, storage and
GPUs in the same 4U • Intel Xeon ES-2600 • Shared infrastructure
chassis processors boost reduces power & cooling
performance by 80% costs by ~20%
• More workload flexibility, HD
& I/O options than the HP • Up to 135W support • Refresh with the latest
components without having
SL6500 or Super Micro • 2x the I/O bandwidth with
6047R to replace the entire chassis
PCI Express Gen3
25 Confidential Big Data Solutions
26. • Visual design for Hadoop
• Reduces skills requirements
• Deep integration with Hadoop
– HDFS, MapReduce, Sqoop, Oo
zie
– Runs as MapReduce in-Hadoop
Reporting & Data Discovery Predictive
• Easily connects Hadoop to Dashboards Visualization Analytics
other enterprise data sources
• Broadens Hadoop use to data
analysts, business users and IT
Data
Ingestion, Man
ipulation, Integ
ration, Workflo
w
26 Confidential Big Data Solutions
27. Fast Visual Development for Hadoop
Ingestion / Manipulation / Integration
Scheduling
Modeling
27 Confidential Big Data Solutions
2
28. Discovery > Proof of Value > Deployment
28 Confidential Big Data Solutions
29. Summary
Dell | Cloudera Solution with Pentaho
Cost-effectively managing the volume, velocity and
variety of data
Derive value across structured and unstructured data
Rapidly adapt to context changes and integrating new
data sources and types
29 Confidential Big Data Solutions
30. Q&A
Ian Fyfe, Pentaho
30 Confidential Big Data Solutions
31. Start getting big insights
Jonathan Seidman, Cloudera
jseidman@cloudera.com
www.cloudera.com
Ian Fyfe, Pentaho
ifyfe@pentaho.com
www.pentaho.com
Jeff Stacey, Dell
Hadoop@dell.com
www.dell.com/hadoop
31 Confidential Big Data Solutions
Hinweis der Redaktion
Come up with something new – to the point what they are looking for.Start with some stories. How real firms used our products, had a problem, solved it.Shows how they can’t solve problem with the tools they have.
Business users asking more sophisticated questionsExplore data in more detailCombine a variety of dataExtract actionable information and insight from it quicklyTraditional “big data” solutionsExtremely expensiveand/orNot enough detail
TravelTainment, a provider of multi-channel distribution platforms for the travel industry, is using Pentaho Business Analytics for self-service analytics and reporting in a Big Data environment. With the continually booming online travel market, TravelTainment’s different clients required more insight into its data to help them plan promotions and other services. Before Pentaho, the company had acquired a set of legacy systems that had grown around individual products with limited reporting capabilities. As a result, reporting was inefficient and time consuming for IT. When TravelTainment decided to standardize on a single customer-focused reporting application, it chose Pentaho Business Analytics for the solution’s self-service reporting and ability to manage Big Data sets. Pentaho Reporting enables TravelTainment to run reports three times faster and with more flexibility than before. TravelTainment can now, for the first time, offer its clients user-friendly, self-service and ad-hoc reporting services. This also means that TravelTainment’s developer team can now fully concentrate on its main business, rather than having to serve as a support desk for reporting. With the success of this implementation, TravelTainment now plans to evaluate using Pentaho Data Integration (PDI) to move its data in and out of Hadoop.
http://content.dell.com/us/en/enterprise/d/corporate~case-studies~en/Documents~2011-dell-bi-11003262.pdf.aspxBusiness needWith explosive data growth and the proliferation of data silos, Dell spent millions on data management without monetizing information. It needed to integrate enterprise data to improve information accuracy, cut costs, and uncover actionable insights. SolutionDell Enterprise Business Intelligence (EBI) consultants helped design and deploy an integrated, global enterprise data warehouse solution, combining Teradata, Informatica, and other BI software with new and existing Dell infrastructure components.Benefits• Accelerated customer shipment time by 33 percent and decreased the shipment backlog • Saved US$2 million by improving product quality and avoiding component replacements• Integrated data silos, offering an enterprise-wide view of information while reducing IT costs by US$35 million• Increased agility by providing information workers with self-service capabilities for accessing certified global data
Introducing the four products that make up the PowerEdge C8000 series:The PowerEdge C8000 4U shared infrastructure chassisThe PowerEdge C8220 single-wide compute sledThe PowerEdge C8220x double-wide GPU sledThe PowerEdge C8000x double-wide storage sled The PowerEdge C8000 chassis holds up to 8 single-wide compute sleds or 4double-wide compute sleds. Each compute sled is equivalent to a standard server built with a processor(s), memory, network interface, baseboard management controller, and local hard drive storage. The C8000 will only be the only 4U Shared Infrastructure on the market that gives customers compute, GPU, and storage options in one chassis with the ability for internal or external power. Zeus delivers the greatest amount configuration flexibility and front-side serviceability. Zeus’ flexibility allows customers to standardize on a single architecture. By using the same common chassis design for a variety of configurations, the PowerEdge C 8000 series can be scaled out, just like a versatile Lego block. The advantages of Zeus:By using the same basic building block over and over again, our customers can get the performance they need, with less deployment and maintenance time needed. This efficient use of IT resources plus the shared infrastructure savings help lower the total cost of ownership. Technology refresh cycles can be staggered to further reduce the total cost of ownership over several years.
Emphasize results they can achieve! Go back to customer case studies.