SlideShare ist ein Scribd-Unternehmen logo
1 von 44
Downloaden Sie, um offline zu lesen
1© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Deep Dive On Pivotal HD - World
Class HDFS Platform
Michael Goddard
2© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Agenda
• Pivotal
• Pivotal Business Data Lake
• Introducing Pivotal HD 2.0
• Pivotal HD 2.0 and Isilon
Update
• Customer Success
• Q&A
3© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
What Matters: Apps. Data. Analytics.
Apps power businesses,
and those apps generate
data
Analytic insights from that
data drive new app
functionality, which in-turn
drives new data
The faster you can move
around that cycle, the
faster you learn, innovate
& pull away from the
competition
4© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
How Pivotal Gets You There
Uniquely positioned to help
enterprises modernize
each facet of this cycle
today
Comprehensive portfolio of
products & services
spanning Big Data, PaaS &
Agile
Converging these
technologies into a
coherent, next-gen
Enterprise PaaS platform
Pivotal Labs Agile Development
Pivotal Data Fabric
Pivotal
One
PaaS
5© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Pivotal’s Big Bets for the Future
1. HDFS becomes the data substrate for the next
generation of data infrastructures
2. A set of integrated, consumer-grade services
must evolve on top of HDFS – stream ingestion,
analytical processing, and transactional
serving
3. Provisioning flexibility and elasticity become
critical capabilities for this data infrastructure
6© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Pivotal Business Data Lake
Govern where it
matters
 Focus on MDM and RDM
 Enforce only when sharing
 Treat corporate as aggregation of local
Encourage local
requirements
 Let the business decide what they need
 Build from the bottom
 Enable traceability to source
 Disposable data views
Distill on demand
 Select only what you want
 Business friendly tooling
 Re-usable information maps
 Rapid change cycle
Store everything
 Store everything ‘as is’
 Include structured and unstructured data
 Store it cheaply
7© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Pivotal Business Data Lake Architecture
Centralized Management
System monitoring System management
Unified Data Management Tier
Data mgmt.
services
MDM
RDM
Audit and
policy mgmt.
Processing Tier
Workflow Management
In-memory
MPP database
Existing Sources
Unified Sources Flexible Actions
Real-time
ingestion
Micro batch
ingestion
Batch
ingestion
Real-time
insights
Interactive
insights
Batch
insights
HDFS
New Data Sources
8© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Pivotal Business Data Lake Architecture
Centralized Management
Unified Data Management Tier
Data Dispatch
MDM
RDM
Data Dispatch
Processing Tier
Spring XD
Pivotal GemFire XD
HAWQ
Unified Sources Flexible Actions
Clickstream
Sensor Data
Weblogs
Network Data
CRM Data
ERP Data
Pivotal
GemFire
Pivotal
RabbitMQ
Redis
Pivotal CFPivotal HD
Command Center
Existing SourcesNew Data Sources
9© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
How is a Business Data Lake Different?
Business Data LakeCriteria EDW
Common data
model
Base class = standard data
Derived classes = local data
Single class = single view across the
enterprise
Data quality Full spectrum 1 0
0 1 01 0
0 1
0 1
1 1 0
Data integration
Multiple interfaces SQL, SAS, R, MapReduce, NoSQL
SQL access integration with SAS, R
and other analytical interfaces
Mixed workload
with varying QoS
Support low latency, interactive and
batch
Limited QoS separation required
10© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Introducing Pivotal HD 2.0
• Foundation for Business
Data Lake
• World’s Most Advanced Real-
Time Analytics Platform
• Most Extensive Set of Advanced
Analytical Toolsets
11© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Pivotal HD Architecture
HDFS
HBas
e Pig, Hive,
Mahout
Map
Reduce
Sqoop Flume
Resource
Management
& Workflow
YARN
ZooKeeper
Apache Pivotal
Command
Center
Configure,
Deploy,
Monitor,
Manage
Spring XD
Pivotal HD
Enterprise
Spring
Xtension
Framework
Catalog
Services
Query
Optimizer
Dynamic Pipelining
ANSI SQL + Analytics
HAWQ – Advanced
Database Services
Distributed
In-memory
Store
Query
Transactions
Ingestion
Processing
Hadoop Driver –
Parallel with Compaction
ANSI SQL + In-Memory
Pivotal GemFire XD –
Real-Time Database Services
MADlib Algorithms
Oozie
Virtual
Extensions
GraphLab,
Open MPI
12© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
New Apache Hadoop Features in Pivotal
HD 2.0
• Apache Hadoop 2.2 enables enterprise
operationalization features such as NFS
and Snapshots
• Hive 0.12 is faster, has better scalability,
and broader SQL data type support
• Pig 0.12 (incl. PiggyBank) increases
productivity and appeal for broader set
of users
• HBase 0.96 improves in mean time
between recovery and modularization for
easy upgrade and reduced dependencies
13© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Hadoop at the Center
Enabling the Data-Driven Enterprise
Hadoop as a Service
Big Data On-Demand
GemFire XD
In-Memory Real-time Analytics
Spring XD
Building Big Data Apps
Open Source
Algorithm Libraries
Chorus
Big Data Collaboration
Fastest SQL Query Engine
14© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Real-Time Analytics
• Adds fast data ingest, and real-time event
processing and query performance, enabling SQL
users to rapidly analyze and react to high
volumes of events on HDFS
• Enables the creation of low latency, scale out
OLTP applications integrated out of the box with a
big data store.
• Creates a single platform for Analytics and OLTP,
removing the need for an ETL process
• Supports changes to database tables while still
complying to the immutability of HDFS
15© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Real-Time Data Services on Pivotal HD
Pivotal
GemFire XD
HAWQ
Pivotal Extension
Framework
Model
Refresh
MapReduce
I/P & O/P
Formatter
Native Persistence Command
Center
Model
Refresh
Online Apps
Analytic Apps
Sensor Data / Feeds
Pivotal HD
Enterprise
Shared Data
Re-evaluate
Model
Re-evaluate
Model
HDFS
16© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Pivotal GemFire XD 1.0 Major Features
Enterprise real-time data processing platform for SLA critical applications; enables users to rapidly and reliably
analyze and react to high volumes of events while leveraging 10s of TBs of in-memory reference data.
Cloud Scale
Real-Time Platform
Seamless Pivotal HD
Integration
Optimized for
Real-Time Analytics
• Very low & predictable
latencies at high and
variable loads
• 10s of TBs in-memory
(MemScale)
• Multi-tiered caching
• Real-time event
processing
• Rolling upgrade support
• SQL-based queries
• Support structured data
• Java stored procedures
• Deep Spring Data
integration
• Scale to HDFS with
policy driven in-memory
data retention
• Online and offline
querying of HDFS data
• ETL-less bi-directional
integration with other
Pivotal HD services
• Pivotal Extension
Framework Integration
• ICM Integration
Enterprise-Class
Reliability
• Distributed transactions
(JTA)
• HA through in-memory
redundancy
• Active-active
deployments across
WAN
• JMX based scalable
management
• Visual monitoring
through Pulse
17© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Deep Scalable Analytics
• User Defined Functions: PL/R, PL/Java,
PL/Python enable writing UDFs in additional
languages that execute inside the
database, improving performance
• Parquet columnar open storage format
delivers significant performance and
scalability improvements
• Richer set of open source machine learning
algorithms helps conduct rapid data science
experiments on relational data
18© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Deep Scalable Analytics
Provides data-parallel implementations
of mathematical, statistical and machine-learning methods
for structured and unstructured data.
19© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
• HAWQ 1.2
Deep Scalable Analytics
• Linear Regression
• Logistic Regression
• Multinomial Logistic Regression
• K-Means
• Association Rules
• Latent Dirichlet Allocation
• Naïve Bayes
• Elastic Net Regression
• Decision Trees / Random Forest
• Support Vector Machines
• Cox Proportional Hazards Regression
• Descriptive Statistics
• ARIMA
20© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Pivotal vs. PL/R
• Interface is R client
• Execution is in database
• Parallelism handled by
PivotalR
• Supports a portion of R
PivotalR
• Interface is SQL client
• Execution is in R
• Parallelism via SQL function
invocation
• Supports all of R
PL/R
21© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
HAWQ: SQL on Hadoop, Format Agnostic
Pivotal HD: HDFS Data Lake
Future formats …
ANSI
SQL
22© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
HAWQ Continue to Soar
 NameNode High Availability (HA) Support
improves availability of query processing
with full Hadoop fault tolerance
 Error Table helps to debug data errors
 Parquet file format: columnar data storage
for HDFS
 HAWQ expansion increases performance
(concurrency/throughput) by expanding
query processing to newly added data
nodes
23© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
NameNode HA Support
• Feature:
– Automatic failover to secondary NameNode when primary
fails
• Benefits:
– Fully fault tolerant to NameNode failures
– Improved availability of query processing
– Integrated into Hadoop availability model
24© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Error Table
• Feature:
– System table for storing non-conforming data
• Benefits:
– Eliminates erroneous data load
– Reduces retries during load
– Helps to debug errors in data structures
25© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Parquet
• Features:
– Open storage format
– Hybrid row/column open storage format
– Configurable Parquet or AO/CO format support
– Compression Type: Snappy and Gzip
– Additional data type support
– Parquet Input Format Reader API
• Benefits:
– Delivers significant performance and scalability improvements
– Industry standard compression: Saves storage
– Usable in MapReduce/Hive work loads
26© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
HAWQ Expansion
• Features:
– Expand HAWQ nodes to additional DataNodes
– Expand # of segments per HAWQ segment host
• Benefits:
– Expand query processing
– Increase performance by utilizing maximum CPU/resources
– Increased concurrency/throughput
27© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Big Computing and Graph Analytics
• Open MPI is one of the most mature parallel
computing frameworks now available within
HDFS, eliminating costly data movement and
shortening data science cycles
• GraphLab is a graph-based library of machine
learning algorithms – allowing Data Scientists
and Analysts to leverage popular algorithms
such as PageRank, collaborative filtering and
computer vision in HDFS
Open MPI
28© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Background
• Hadoop MapReduce is not a good fit for
iterative applications (like graph computing,
machine learning, etc.)
• User needs to build separate system/clusters
to support those applications
• MPI is (one of) the most mature/used parallel
computing frameworks
– MPI = Big Computing, Hadoop = Big Data
– MPI + Hadoop = Big Computing + Big Data
29© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
MPI Background
• What is MPI?
– “a standardized and portable message-passing
system designed by a group of researchers from
academia and industry to function on wide variety
of parallel computers” Wikipedia
• What is Open MPI?
– One of the most popular implementations of MPI,
community supported
• What is Hamster?
– “Hadoop And Mpi on the same cluSTER
30© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
GraphLab
• Topic Modeling contains applications like LDA, which can be used to
cluster documents and extract topical representations.
• Graph Analytics contain applications like PageRank and triangle
counting, which can be applied to general graphs to estimate
community structure.
• Clustering contains standard data clustering tools such as k-means
• Collaborative Filtering contains a collection of applications used to
make predictions about users interests and factorize large matrices.
• Graphical Models contain tools for reasoning about structured noisy
data.
• Computer Vision contains a collection of tools for reasoning about
images.
31© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Pivotal HD: Built for Data Science
Relational
Advanced Analytics
Data Science on Pivotal HD
Graph
Advanced Analytics
SQL
R
Python
Java
Languages:
Custom
Analytic Functions - UDFs
32© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
World’s Leading Experts
Pivotal Labs – Pivotal Data Labs
On Demand Services
Pivotal Data Dispatch
BATCH BATCH
INTERACTIVE INTERACTIVEHAWQGreenplum DB
Unlimited Pivotal HD
REAL-TIME REAL-TIMEGemFire XDGemFire | SQLFire
33© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Pivotal Enables
Hadoop Market Adoption
Data Lakes
Unify Unstructured and
Structured Data Access
Big Data Apps
Build analytic and
transaction-led
applications impacting
top line revenue
Data-Driven
Enterprise
App Dev and Operational
Management on HDFS
Data Architecture
34© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Pivotal HD 2.0 and Isilon Update
• Isilon aligns with our Enterprise Grade Message
• Pivotal Command Center 2.2 (part of Pivotal HD 2.0)
– Works with Pivotal HD 1.1.1
– ‘Down’ status of HDFS is removed when Isilon is configured
• Isilon has accelerated their integration from Q4 to
Q3 for HDFS 2.2
35© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Large Mid-Market Financier Builds Foundation to
Store All Data of Interest, Convert Insights to Value-
added Services
Challenge:
• Mid Market financier seeks to maintain high margins
through value-added services
• Realized that critical insights could come from many
sources, but much was deleted due to storage cost
• Frustrated by lack of ability to blend data fabric, build
analytics on top, create applications on top of this.
Solution:
• Data Lake provides accessibility of any information of
interest through familiar SQL-Like interface
• Provide foundation for creation of Analytics and
Applications as value added services: forecast demand
based on social media sentiment, analytics on fleet
vehicle usage
36© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Major TV Network Replaces Teradata with Pivotal
Builds Infrastructure to Capture $40 Million in Untapped Revenue
Challenge:
• Ad Inventory is an inherently perishable product, and
is subject to inefficient, “traditional” selling process.
• Upward trend in volume and traffic due to higher ad
quality, mobile devices.
• Inability to react: 7 hour lag time in communication
between ad fulfillment and sales teams, this was
exacerbated by major broadcast events.
Solution:
• Reduced 7-hour lag time to under 1 hour – enabling
network sales to communicate delivered impressions,
forecast spend inventory and sell more effectively
• Maximized profit by selling across brands/channels –
allowing network to better leverage non-premium
inventory
37© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Home Appliance Maker Lays Foundation for “Smart”
Connected Devices, Big Data-based Decisions
Challenge:
• Prepare for next generation appliances: “smart”
connected devices, controlled by mobile phone
• Silo’ed environment including Teradata, SAS, HP made
it difficult to derive true insights across disparate data
Solution:
• Enable Innovation, improve service performance
through appliances that provide feedback based on
output, environmental factors
• Improve marketing efficiency with targeted campaigns
based on market demographics, buying indicators
• Better understand requirements for parts inventory
based on current appliances lifecycle
38© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
National Healthcare Organization Replaces Aging
IBM Platform, Seeds Data Lake as Hadoop
Beachhead
Challenge:
• Aging IBM Infrastructure could not support new SAS
Access and Visual Analytics Technology
• Interest in enabling infrastructure to support for-profit
healthcare analytics as a service business
• Sought to provide refined data sets to other insurance
companies for their own research, needed way to
cleanse data
Solution:
• Stepwise evolution of platform onto GPDB, one of two
certified platform partners for running visual analytics
• Established data lake as platform for upload, cleansing
and conversion of private data into publicly
consumable datasets
39© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Aviation: Predictive Maintenance
Challenge:
• An airplane’s comprehensive “gate to gate” flight data
didn’t exist in a single place for reporting
• Each individual flight can generate approximately 1 TB
of data - economically infeasible in traditional EDW
• To maintain profitability of GE Aviation's Contract
Service Agreements, new analytical methods and
approaches were required
Solution:
• Ingest all data to a data lake for data discovery and
model development to increase wing time, greater
aircraft uptime, improve customer satisfaction and
airline profitability
• Improved capacity for preventative maintenance
rather than remediation, reducing expense and liability
Pivotal Solution includes: GPDB,
PHD, Alpine, Chorus
40© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Brazilian Telco Provider Establishes
Foundation for Data-Driven Culture
Challenge:
• Poor call quality caused massive loss of customers. No
Insight into root cause of issues.
• Increased scrutiny from regulators, but infrastructure
did not support the requests for information needed
• Difficulties with Scale: Call Data Record generates 2
Billion new records per day, no info on dropped calls
due to capacity
Solution:
• New Data Warehouse infrastructure contains both
dropped and completed calls for analysis, 3 month
capacity
• Hadoop infrastructure with familiar SQL interface
stores 5x volume at half cost of Teradata
• Reports which took 2 Months to obtain now take 1 day
Pivotal Solution includes: PHD, HAWQ
41© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Pivotal HD 2.0 Summary
• The Foundation for Business Data Lake
• The World’s Most Advanced Hadoop Stack
– Pivotal HD now based on Apache 2.2
– Real-time SQL, in-memory over Pivotal HD and integrated
into Spring: Pivotal GemFire XD
– Enhanced Interactive SQL over Pivotal HD: HAWQ
• World’s Most Advanced Big Data Analytic Platform
– Most extensive set of machine learning libraries: MADlib,
R and GraphLab
42© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Pivotal HD 2.0
demo:
PivotalBooth
43© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Thank You
Pivotal deep dive_on_pivotal_hd_world_class_hdfs_platform

Weitere ähnliche Inhalte

Was ist angesagt?

A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?DATAVERSITY
 
When you need more data in less time...
When you need more data in less time...When you need more data in less time...
When you need more data in less time...Bálint Horváth
 
Big Data Made Easy: A Simple, Scalable Solution for Getting Started with Hadoop
Big Data Made Easy:  A Simple, Scalable Solution for Getting Started with HadoopBig Data Made Easy:  A Simple, Scalable Solution for Getting Started with Hadoop
Big Data Made Easy: A Simple, Scalable Solution for Getting Started with HadoopPrecisely
 
Estimating the Total Costs of Your Cloud Analytics Platform 
Estimating the Total Costs of Your Cloud Analytics Platform Estimating the Total Costs of Your Cloud Analytics Platform 
Estimating the Total Costs of Your Cloud Analytics Platform DATAVERSITY
 
Dickey's Barbecue Pit Heats Up Analytics with Amazon Web Services
Dickey's Barbecue Pit Heats Up Analytics with Amazon Web ServicesDickey's Barbecue Pit Heats Up Analytics with Amazon Web Services
Dickey's Barbecue Pit Heats Up Analytics with Amazon Web ServicesPrecisely
 
Eric Andersen Keynote
Eric Andersen KeynoteEric Andersen Keynote
Eric Andersen KeynoteData Con LA
 
Slides: Relational to NoSQL Migration
Slides: Relational to NoSQL MigrationSlides: Relational to NoSQL Migration
Slides: Relational to NoSQL MigrationDATAVERSITY
 
Business Intelligence Solution on Windows Azure
Business Intelligence Solution on Windows AzureBusiness Intelligence Solution on Windows Azure
Business Intelligence Solution on Windows AzureInfosys
 
Why Data Lake should be the foundation of Enterprise Data Architecture
Why Data Lake should be the foundation of Enterprise Data ArchitectureWhy Data Lake should be the foundation of Enterprise Data Architecture
Why Data Lake should be the foundation of Enterprise Data ArchitectureAgilisium Consulting
 
Effective use of cloud resources for Data Engineering - Johnson Darkwah
Effective use of cloud resources for Data Engineering - Johnson DarkwahEffective use of cloud resources for Data Engineering - Johnson Darkwah
Effective use of cloud resources for Data Engineering - Johnson DarkwahMatěj Jakimov
 
1 welcome and keynote storage strategies for the new normal
1 welcome and keynote storage strategies for the new normal1 welcome and keynote storage strategies for the new normal
1 welcome and keynote storage strategies for the new normalDr. Wilfred Lin (Ph.D.)
 
Revolution in Business Analytics-Zika Virus Example
Revolution in Business Analytics-Zika Virus ExampleRevolution in Business Analytics-Zika Virus Example
Revolution in Business Analytics-Zika Virus ExampleBardess Group
 
Data architecture for modern enterprise
Data architecture for modern enterpriseData architecture for modern enterprise
Data architecture for modern enterprisekayalvizhi kandasamy
 
Journey to Cloud Analytics
Journey to Cloud Analytics Journey to Cloud Analytics
Journey to Cloud Analytics Datavail
 
Hitachi Data Systems Hadoop Solution
Hitachi Data Systems Hadoop SolutionHitachi Data Systems Hadoop Solution
Hitachi Data Systems Hadoop SolutionHitachi Vantara
 
Accelerate Migration to the Cloud using Data Virtualization (APAC)
Accelerate Migration to the Cloud using Data Virtualization (APAC)Accelerate Migration to the Cloud using Data Virtualization (APAC)
Accelerate Migration to the Cloud using Data Virtualization (APAC)Denodo
 
Data Services and the Modern Data Ecosystem (ASEAN)
Data Services and the Modern Data Ecosystem (ASEAN)Data Services and the Modern Data Ecosystem (ASEAN)
Data Services and the Modern Data Ecosystem (ASEAN)Denodo
 
Cloudera and Qlik: Big Data Analytics for Business
Cloudera and Qlik: Big Data Analytics for BusinessCloudera and Qlik: Big Data Analytics for Business
Cloudera and Qlik: Big Data Analytics for BusinessData IQ Argentina
 
Data Driven Possibilities with Qlik
Data Driven Possibilities with QlikData Driven Possibilities with Qlik
Data Driven Possibilities with QlikMischa van Werkhoven
 

Was ist angesagt? (20)

A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
 
When you need more data in less time...
When you need more data in less time...When you need more data in less time...
When you need more data in less time...
 
Big Data Made Easy: A Simple, Scalable Solution for Getting Started with Hadoop
Big Data Made Easy:  A Simple, Scalable Solution for Getting Started with HadoopBig Data Made Easy:  A Simple, Scalable Solution for Getting Started with Hadoop
Big Data Made Easy: A Simple, Scalable Solution for Getting Started with Hadoop
 
Estimating the Total Costs of Your Cloud Analytics Platform 
Estimating the Total Costs of Your Cloud Analytics Platform Estimating the Total Costs of Your Cloud Analytics Platform 
Estimating the Total Costs of Your Cloud Analytics Platform 
 
Dickey's Barbecue Pit Heats Up Analytics with Amazon Web Services
Dickey's Barbecue Pit Heats Up Analytics with Amazon Web ServicesDickey's Barbecue Pit Heats Up Analytics with Amazon Web Services
Dickey's Barbecue Pit Heats Up Analytics with Amazon Web Services
 
Eric Andersen Keynote
Eric Andersen KeynoteEric Andersen Keynote
Eric Andersen Keynote
 
Slides: Relational to NoSQL Migration
Slides: Relational to NoSQL MigrationSlides: Relational to NoSQL Migration
Slides: Relational to NoSQL Migration
 
Business Intelligence Solution on Windows Azure
Business Intelligence Solution on Windows AzureBusiness Intelligence Solution on Windows Azure
Business Intelligence Solution on Windows Azure
 
Why Data Lake should be the foundation of Enterprise Data Architecture
Why Data Lake should be the foundation of Enterprise Data ArchitectureWhy Data Lake should be the foundation of Enterprise Data Architecture
Why Data Lake should be the foundation of Enterprise Data Architecture
 
Effective use of cloud resources for Data Engineering - Johnson Darkwah
Effective use of cloud resources for Data Engineering - Johnson DarkwahEffective use of cloud resources for Data Engineering - Johnson Darkwah
Effective use of cloud resources for Data Engineering - Johnson Darkwah
 
1 welcome and keynote storage strategies for the new normal
1 welcome and keynote storage strategies for the new normal1 welcome and keynote storage strategies for the new normal
1 welcome and keynote storage strategies for the new normal
 
On Demand BI
On Demand BIOn Demand BI
On Demand BI
 
Revolution in Business Analytics-Zika Virus Example
Revolution in Business Analytics-Zika Virus ExampleRevolution in Business Analytics-Zika Virus Example
Revolution in Business Analytics-Zika Virus Example
 
Data architecture for modern enterprise
Data architecture for modern enterpriseData architecture for modern enterprise
Data architecture for modern enterprise
 
Journey to Cloud Analytics
Journey to Cloud Analytics Journey to Cloud Analytics
Journey to Cloud Analytics
 
Hitachi Data Systems Hadoop Solution
Hitachi Data Systems Hadoop SolutionHitachi Data Systems Hadoop Solution
Hitachi Data Systems Hadoop Solution
 
Accelerate Migration to the Cloud using Data Virtualization (APAC)
Accelerate Migration to the Cloud using Data Virtualization (APAC)Accelerate Migration to the Cloud using Data Virtualization (APAC)
Accelerate Migration to the Cloud using Data Virtualization (APAC)
 
Data Services and the Modern Data Ecosystem (ASEAN)
Data Services and the Modern Data Ecosystem (ASEAN)Data Services and the Modern Data Ecosystem (ASEAN)
Data Services and the Modern Data Ecosystem (ASEAN)
 
Cloudera and Qlik: Big Data Analytics for Business
Cloudera and Qlik: Big Data Analytics for BusinessCloudera and Qlik: Big Data Analytics for Business
Cloudera and Qlik: Big Data Analytics for Business
 
Data Driven Possibilities with Qlik
Data Driven Possibilities with QlikData Driven Possibilities with Qlik
Data Driven Possibilities with Qlik
 

Ähnlich wie Pivotal deep dive_on_pivotal_hd_world_class_hdfs_platform

Pivotal the new_pivotal_big_data_suite_-_revolutionary_foundation_to_leverage...
Pivotal the new_pivotal_big_data_suite_-_revolutionary_foundation_to_leverage...Pivotal the new_pivotal_big_data_suite_-_revolutionary_foundation_to_leverage...
Pivotal the new_pivotal_big_data_suite_-_revolutionary_foundation_to_leverage...EMC
 
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...Hortonworks
 
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUGReal-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUGskumpf
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Cécile Poyet
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Hortonworks
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Cécile Poyet
 
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Innovative Management Services
 
Oracle Big Data Appliance and Big Data SQL for advanced analytics
Oracle Big Data Appliance and Big Data SQL for advanced analyticsOracle Big Data Appliance and Big Data SQL for advanced analytics
Oracle Big Data Appliance and Big Data SQL for advanced analyticsjdijcks
 
A new platform for a new era emc
A new platform for a new era   emcA new platform for a new era   emc
A new platform for a new era emcTaldor Group
 
Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...
Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...
Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...Hortonworks
 
Simplify and Secure your Hadoop Environment with Hortonworks and Centrify
Simplify and Secure your Hadoop Environment with Hortonworks and CentrifySimplify and Secure your Hadoop Environment with Hortonworks and Centrify
Simplify and Secure your Hadoop Environment with Hortonworks and CentrifyHortonworks
 
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...MapR Technologies
 
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the CloudBring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the CloudDataWorks Summit
 
Bridging the Big Data Gap in the Software-Driven World
Bridging the Big Data Gap in the Software-Driven WorldBridging the Big Data Gap in the Software-Driven World
Bridging the Big Data Gap in the Software-Driven WorldCA Technologies
 
Building a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise HadoopBuilding a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise HadoopSlim Baltagi
 
Transform Your Business with Big Data and Hortonworks
Transform Your Business with Big Data and Hortonworks Transform Your Business with Big Data and Hortonworks
Transform Your Business with Big Data and Hortonworks Pactera_US
 
Modern infrastructure for business data lake
Modern infrastructure for business data lakeModern infrastructure for business data lake
Modern infrastructure for business data lakeEMC
 
Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration Hortonworks
 
Storm Demo Talk - Denver Apr 2015
Storm Demo Talk - Denver Apr 2015Storm Demo Talk - Denver Apr 2015
Storm Demo Talk - Denver Apr 2015Mac Moore
 
Verizon Centralizes Data into a Data Lake in Real Time for Analytics
Verizon Centralizes Data into a Data Lake in Real Time for AnalyticsVerizon Centralizes Data into a Data Lake in Real Time for Analytics
Verizon Centralizes Data into a Data Lake in Real Time for AnalyticsDataWorks Summit
 

Ähnlich wie Pivotal deep dive_on_pivotal_hd_world_class_hdfs_platform (20)

Pivotal the new_pivotal_big_data_suite_-_revolutionary_foundation_to_leverage...
Pivotal the new_pivotal_big_data_suite_-_revolutionary_foundation_to_leverage...Pivotal the new_pivotal_big_data_suite_-_revolutionary_foundation_to_leverage...
Pivotal the new_pivotal_big_data_suite_-_revolutionary_foundation_to_leverage...
 
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
 
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUGReal-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It!
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It!
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It!
 
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
 
Oracle Big Data Appliance and Big Data SQL for advanced analytics
Oracle Big Data Appliance and Big Data SQL for advanced analyticsOracle Big Data Appliance and Big Data SQL for advanced analytics
Oracle Big Data Appliance and Big Data SQL for advanced analytics
 
A new platform for a new era emc
A new platform for a new era   emcA new platform for a new era   emc
A new platform for a new era emc
 
Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...
Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...
Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...
 
Simplify and Secure your Hadoop Environment with Hortonworks and Centrify
Simplify and Secure your Hadoop Environment with Hortonworks and CentrifySimplify and Secure your Hadoop Environment with Hortonworks and Centrify
Simplify and Secure your Hadoop Environment with Hortonworks and Centrify
 
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
 
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the CloudBring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
 
Bridging the Big Data Gap in the Software-Driven World
Bridging the Big Data Gap in the Software-Driven WorldBridging the Big Data Gap in the Software-Driven World
Bridging the Big Data Gap in the Software-Driven World
 
Building a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise HadoopBuilding a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise Hadoop
 
Transform Your Business with Big Data and Hortonworks
Transform Your Business with Big Data and Hortonworks Transform Your Business with Big Data and Hortonworks
Transform Your Business with Big Data and Hortonworks
 
Modern infrastructure for business data lake
Modern infrastructure for business data lakeModern infrastructure for business data lake
Modern infrastructure for business data lake
 
Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration
 
Storm Demo Talk - Denver Apr 2015
Storm Demo Talk - Denver Apr 2015Storm Demo Talk - Denver Apr 2015
Storm Demo Talk - Denver Apr 2015
 
Verizon Centralizes Data into a Data Lake in Real Time for Analytics
Verizon Centralizes Data into a Data Lake in Real Time for AnalyticsVerizon Centralizes Data into a Data Lake in Real Time for Analytics
Verizon Centralizes Data into a Data Lake in Real Time for Analytics
 

Mehr von EMC

INDUSTRY-LEADING TECHNOLOGY FOR LONG TERM RETENTION OF BACKUPS IN THE CLOUD
INDUSTRY-LEADING  TECHNOLOGY FOR LONG TERM RETENTION OF BACKUPS IN THE CLOUDINDUSTRY-LEADING  TECHNOLOGY FOR LONG TERM RETENTION OF BACKUPS IN THE CLOUD
INDUSTRY-LEADING TECHNOLOGY FOR LONG TERM RETENTION OF BACKUPS IN THE CLOUDEMC
 
Cloud Foundry Summit Berlin Keynote
Cloud Foundry Summit Berlin Keynote Cloud Foundry Summit Berlin Keynote
Cloud Foundry Summit Berlin Keynote EMC
 
EMC GLOBAL DATA PROTECTION INDEX
EMC GLOBAL DATA PROTECTION INDEX EMC GLOBAL DATA PROTECTION INDEX
EMC GLOBAL DATA PROTECTION INDEX EMC
 
Transforming Desktop Virtualization with Citrix XenDesktop and EMC XtremIO
Transforming Desktop Virtualization with Citrix XenDesktop and EMC XtremIOTransforming Desktop Virtualization with Citrix XenDesktop and EMC XtremIO
Transforming Desktop Virtualization with Citrix XenDesktop and EMC XtremIOEMC
 
Citrix ready-webinar-xtremio
Citrix ready-webinar-xtremioCitrix ready-webinar-xtremio
Citrix ready-webinar-xtremioEMC
 
EMC FORUM RESEARCH GLOBAL RESULTS - 10,451 RESPONSES ACROSS 33 COUNTRIES
EMC FORUM RESEARCH GLOBAL RESULTS - 10,451 RESPONSES ACROSS 33 COUNTRIES EMC FORUM RESEARCH GLOBAL RESULTS - 10,451 RESPONSES ACROSS 33 COUNTRIES
EMC FORUM RESEARCH GLOBAL RESULTS - 10,451 RESPONSES ACROSS 33 COUNTRIES EMC
 
EMC with Mirantis Openstack
EMC with Mirantis OpenstackEMC with Mirantis Openstack
EMC with Mirantis OpenstackEMC
 
Force Cyber Criminals to Shop Elsewhere
Force Cyber Criminals to Shop ElsewhereForce Cyber Criminals to Shop Elsewhere
Force Cyber Criminals to Shop ElsewhereEMC
 
Pivotal : Moments in Container History
Pivotal : Moments in Container History Pivotal : Moments in Container History
Pivotal : Moments in Container History EMC
 
Data Lake Protection - A Technical Review
Data Lake Protection - A Technical ReviewData Lake Protection - A Technical Review
Data Lake Protection - A Technical ReviewEMC
 
Mobile E-commerce: Friend or Foe
Mobile E-commerce: Friend or FoeMobile E-commerce: Friend or Foe
Mobile E-commerce: Friend or FoeEMC
 
Virtualization Myths Infographic
Virtualization Myths Infographic Virtualization Myths Infographic
Virtualization Myths Infographic EMC
 
Intelligence-Driven GRC for Security
Intelligence-Driven GRC for SecurityIntelligence-Driven GRC for Security
Intelligence-Driven GRC for SecurityEMC
 
The Trust Paradox: Access Management and Trust in an Insecure Age
The Trust Paradox: Access Management and Trust in an Insecure AgeThe Trust Paradox: Access Management and Trust in an Insecure Age
The Trust Paradox: Access Management and Trust in an Insecure AgeEMC
 
EMC Technology Day - SRM University 2015
EMC Technology Day - SRM University 2015EMC Technology Day - SRM University 2015
EMC Technology Day - SRM University 2015EMC
 
EMC Academic Summit 2015
EMC Academic Summit 2015EMC Academic Summit 2015
EMC Academic Summit 2015EMC
 
Data Science and Big Data Analytics Book from EMC Education Services
Data Science and Big Data Analytics Book from EMC Education ServicesData Science and Big Data Analytics Book from EMC Education Services
Data Science and Big Data Analytics Book from EMC Education ServicesEMC
 
Using EMC Symmetrix Storage in VMware vSphere Environments
Using EMC Symmetrix Storage in VMware vSphere EnvironmentsUsing EMC Symmetrix Storage in VMware vSphere Environments
Using EMC Symmetrix Storage in VMware vSphere EnvironmentsEMC
 
Using EMC VNX storage with VMware vSphereTechBook
Using EMC VNX storage with VMware vSphereTechBookUsing EMC VNX storage with VMware vSphereTechBook
Using EMC VNX storage with VMware vSphereTechBookEMC
 
2014 Cybercrime Roundup: The Year of the POS Breach
2014 Cybercrime Roundup: The Year of the POS Breach2014 Cybercrime Roundup: The Year of the POS Breach
2014 Cybercrime Roundup: The Year of the POS BreachEMC
 

Mehr von EMC (20)

INDUSTRY-LEADING TECHNOLOGY FOR LONG TERM RETENTION OF BACKUPS IN THE CLOUD
INDUSTRY-LEADING  TECHNOLOGY FOR LONG TERM RETENTION OF BACKUPS IN THE CLOUDINDUSTRY-LEADING  TECHNOLOGY FOR LONG TERM RETENTION OF BACKUPS IN THE CLOUD
INDUSTRY-LEADING TECHNOLOGY FOR LONG TERM RETENTION OF BACKUPS IN THE CLOUD
 
Cloud Foundry Summit Berlin Keynote
Cloud Foundry Summit Berlin Keynote Cloud Foundry Summit Berlin Keynote
Cloud Foundry Summit Berlin Keynote
 
EMC GLOBAL DATA PROTECTION INDEX
EMC GLOBAL DATA PROTECTION INDEX EMC GLOBAL DATA PROTECTION INDEX
EMC GLOBAL DATA PROTECTION INDEX
 
Transforming Desktop Virtualization with Citrix XenDesktop and EMC XtremIO
Transforming Desktop Virtualization with Citrix XenDesktop and EMC XtremIOTransforming Desktop Virtualization with Citrix XenDesktop and EMC XtremIO
Transforming Desktop Virtualization with Citrix XenDesktop and EMC XtremIO
 
Citrix ready-webinar-xtremio
Citrix ready-webinar-xtremioCitrix ready-webinar-xtremio
Citrix ready-webinar-xtremio
 
EMC FORUM RESEARCH GLOBAL RESULTS - 10,451 RESPONSES ACROSS 33 COUNTRIES
EMC FORUM RESEARCH GLOBAL RESULTS - 10,451 RESPONSES ACROSS 33 COUNTRIES EMC FORUM RESEARCH GLOBAL RESULTS - 10,451 RESPONSES ACROSS 33 COUNTRIES
EMC FORUM RESEARCH GLOBAL RESULTS - 10,451 RESPONSES ACROSS 33 COUNTRIES
 
EMC with Mirantis Openstack
EMC with Mirantis OpenstackEMC with Mirantis Openstack
EMC with Mirantis Openstack
 
Force Cyber Criminals to Shop Elsewhere
Force Cyber Criminals to Shop ElsewhereForce Cyber Criminals to Shop Elsewhere
Force Cyber Criminals to Shop Elsewhere
 
Pivotal : Moments in Container History
Pivotal : Moments in Container History Pivotal : Moments in Container History
Pivotal : Moments in Container History
 
Data Lake Protection - A Technical Review
Data Lake Protection - A Technical ReviewData Lake Protection - A Technical Review
Data Lake Protection - A Technical Review
 
Mobile E-commerce: Friend or Foe
Mobile E-commerce: Friend or FoeMobile E-commerce: Friend or Foe
Mobile E-commerce: Friend or Foe
 
Virtualization Myths Infographic
Virtualization Myths Infographic Virtualization Myths Infographic
Virtualization Myths Infographic
 
Intelligence-Driven GRC for Security
Intelligence-Driven GRC for SecurityIntelligence-Driven GRC for Security
Intelligence-Driven GRC for Security
 
The Trust Paradox: Access Management and Trust in an Insecure Age
The Trust Paradox: Access Management and Trust in an Insecure AgeThe Trust Paradox: Access Management and Trust in an Insecure Age
The Trust Paradox: Access Management and Trust in an Insecure Age
 
EMC Technology Day - SRM University 2015
EMC Technology Day - SRM University 2015EMC Technology Day - SRM University 2015
EMC Technology Day - SRM University 2015
 
EMC Academic Summit 2015
EMC Academic Summit 2015EMC Academic Summit 2015
EMC Academic Summit 2015
 
Data Science and Big Data Analytics Book from EMC Education Services
Data Science and Big Data Analytics Book from EMC Education ServicesData Science and Big Data Analytics Book from EMC Education Services
Data Science and Big Data Analytics Book from EMC Education Services
 
Using EMC Symmetrix Storage in VMware vSphere Environments
Using EMC Symmetrix Storage in VMware vSphere EnvironmentsUsing EMC Symmetrix Storage in VMware vSphere Environments
Using EMC Symmetrix Storage in VMware vSphere Environments
 
Using EMC VNX storage with VMware vSphereTechBook
Using EMC VNX storage with VMware vSphereTechBookUsing EMC VNX storage with VMware vSphereTechBook
Using EMC VNX storage with VMware vSphereTechBook
 
2014 Cybercrime Roundup: The Year of the POS Breach
2014 Cybercrime Roundup: The Year of the POS Breach2014 Cybercrime Roundup: The Year of the POS Breach
2014 Cybercrime Roundup: The Year of the POS Breach
 

Kürzlich hochgeladen

Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 

Kürzlich hochgeladen (20)

Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 

Pivotal deep dive_on_pivotal_hd_world_class_hdfs_platform

  • 1. 1© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. Deep Dive On Pivotal HD - World Class HDFS Platform Michael Goddard
  • 2. 2© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. Agenda • Pivotal • Pivotal Business Data Lake • Introducing Pivotal HD 2.0 • Pivotal HD 2.0 and Isilon Update • Customer Success • Q&A
  • 3. 3© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. What Matters: Apps. Data. Analytics. Apps power businesses, and those apps generate data Analytic insights from that data drive new app functionality, which in-turn drives new data The faster you can move around that cycle, the faster you learn, innovate & pull away from the competition
  • 4. 4© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. How Pivotal Gets You There Uniquely positioned to help enterprises modernize each facet of this cycle today Comprehensive portfolio of products & services spanning Big Data, PaaS & Agile Converging these technologies into a coherent, next-gen Enterprise PaaS platform Pivotal Labs Agile Development Pivotal Data Fabric Pivotal One PaaS
  • 5. 5© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. Pivotal’s Big Bets for the Future 1. HDFS becomes the data substrate for the next generation of data infrastructures 2. A set of integrated, consumer-grade services must evolve on top of HDFS – stream ingestion, analytical processing, and transactional serving 3. Provisioning flexibility and elasticity become critical capabilities for this data infrastructure
  • 6. 6© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. Pivotal Business Data Lake Govern where it matters  Focus on MDM and RDM  Enforce only when sharing  Treat corporate as aggregation of local Encourage local requirements  Let the business decide what they need  Build from the bottom  Enable traceability to source  Disposable data views Distill on demand  Select only what you want  Business friendly tooling  Re-usable information maps  Rapid change cycle Store everything  Store everything ‘as is’  Include structured and unstructured data  Store it cheaply
  • 7. 7© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. Pivotal Business Data Lake Architecture Centralized Management System monitoring System management Unified Data Management Tier Data mgmt. services MDM RDM Audit and policy mgmt. Processing Tier Workflow Management In-memory MPP database Existing Sources Unified Sources Flexible Actions Real-time ingestion Micro batch ingestion Batch ingestion Real-time insights Interactive insights Batch insights HDFS New Data Sources
  • 8. 8© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. Pivotal Business Data Lake Architecture Centralized Management Unified Data Management Tier Data Dispatch MDM RDM Data Dispatch Processing Tier Spring XD Pivotal GemFire XD HAWQ Unified Sources Flexible Actions Clickstream Sensor Data Weblogs Network Data CRM Data ERP Data Pivotal GemFire Pivotal RabbitMQ Redis Pivotal CFPivotal HD Command Center Existing SourcesNew Data Sources
  • 9. 9© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. How is a Business Data Lake Different? Business Data LakeCriteria EDW Common data model Base class = standard data Derived classes = local data Single class = single view across the enterprise Data quality Full spectrum 1 0 0 1 01 0 0 1 0 1 1 1 0 Data integration Multiple interfaces SQL, SAS, R, MapReduce, NoSQL SQL access integration with SAS, R and other analytical interfaces Mixed workload with varying QoS Support low latency, interactive and batch Limited QoS separation required
  • 10. 10© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. Introducing Pivotal HD 2.0 • Foundation for Business Data Lake • World’s Most Advanced Real- Time Analytics Platform • Most Extensive Set of Advanced Analytical Toolsets
  • 11. 11© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. Pivotal HD Architecture HDFS HBas e Pig, Hive, Mahout Map Reduce Sqoop Flume Resource Management & Workflow YARN ZooKeeper Apache Pivotal Command Center Configure, Deploy, Monitor, Manage Spring XD Pivotal HD Enterprise Spring Xtension Framework Catalog Services Query Optimizer Dynamic Pipelining ANSI SQL + Analytics HAWQ – Advanced Database Services Distributed In-memory Store Query Transactions Ingestion Processing Hadoop Driver – Parallel with Compaction ANSI SQL + In-Memory Pivotal GemFire XD – Real-Time Database Services MADlib Algorithms Oozie Virtual Extensions GraphLab, Open MPI
  • 12. 12© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. New Apache Hadoop Features in Pivotal HD 2.0 • Apache Hadoop 2.2 enables enterprise operationalization features such as NFS and Snapshots • Hive 0.12 is faster, has better scalability, and broader SQL data type support • Pig 0.12 (incl. PiggyBank) increases productivity and appeal for broader set of users • HBase 0.96 improves in mean time between recovery and modularization for easy upgrade and reduced dependencies
  • 13. 13© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. Hadoop at the Center Enabling the Data-Driven Enterprise Hadoop as a Service Big Data On-Demand GemFire XD In-Memory Real-time Analytics Spring XD Building Big Data Apps Open Source Algorithm Libraries Chorus Big Data Collaboration Fastest SQL Query Engine
  • 14. 14© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. Real-Time Analytics • Adds fast data ingest, and real-time event processing and query performance, enabling SQL users to rapidly analyze and react to high volumes of events on HDFS • Enables the creation of low latency, scale out OLTP applications integrated out of the box with a big data store. • Creates a single platform for Analytics and OLTP, removing the need for an ETL process • Supports changes to database tables while still complying to the immutability of HDFS
  • 15. 15© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. Real-Time Data Services on Pivotal HD Pivotal GemFire XD HAWQ Pivotal Extension Framework Model Refresh MapReduce I/P & O/P Formatter Native Persistence Command Center Model Refresh Online Apps Analytic Apps Sensor Data / Feeds Pivotal HD Enterprise Shared Data Re-evaluate Model Re-evaluate Model HDFS
  • 16. 16© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. Pivotal GemFire XD 1.0 Major Features Enterprise real-time data processing platform for SLA critical applications; enables users to rapidly and reliably analyze and react to high volumes of events while leveraging 10s of TBs of in-memory reference data. Cloud Scale Real-Time Platform Seamless Pivotal HD Integration Optimized for Real-Time Analytics • Very low & predictable latencies at high and variable loads • 10s of TBs in-memory (MemScale) • Multi-tiered caching • Real-time event processing • Rolling upgrade support • SQL-based queries • Support structured data • Java stored procedures • Deep Spring Data integration • Scale to HDFS with policy driven in-memory data retention • Online and offline querying of HDFS data • ETL-less bi-directional integration with other Pivotal HD services • Pivotal Extension Framework Integration • ICM Integration Enterprise-Class Reliability • Distributed transactions (JTA) • HA through in-memory redundancy • Active-active deployments across WAN • JMX based scalable management • Visual monitoring through Pulse
  • 17. 17© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. Deep Scalable Analytics • User Defined Functions: PL/R, PL/Java, PL/Python enable writing UDFs in additional languages that execute inside the database, improving performance • Parquet columnar open storage format delivers significant performance and scalability improvements • Richer set of open source machine learning algorithms helps conduct rapid data science experiments on relational data
  • 18. 18© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. Deep Scalable Analytics Provides data-parallel implementations of mathematical, statistical and machine-learning methods for structured and unstructured data.
  • 19. 19© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. • HAWQ 1.2 Deep Scalable Analytics • Linear Regression • Logistic Regression • Multinomial Logistic Regression • K-Means • Association Rules • Latent Dirichlet Allocation • Naïve Bayes • Elastic Net Regression • Decision Trees / Random Forest • Support Vector Machines • Cox Proportional Hazards Regression • Descriptive Statistics • ARIMA
  • 20. 20© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. Pivotal vs. PL/R • Interface is R client • Execution is in database • Parallelism handled by PivotalR • Supports a portion of R PivotalR • Interface is SQL client • Execution is in R • Parallelism via SQL function invocation • Supports all of R PL/R
  • 21. 21© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. HAWQ: SQL on Hadoop, Format Agnostic Pivotal HD: HDFS Data Lake Future formats … ANSI SQL
  • 22. 22© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. HAWQ Continue to Soar  NameNode High Availability (HA) Support improves availability of query processing with full Hadoop fault tolerance  Error Table helps to debug data errors  Parquet file format: columnar data storage for HDFS  HAWQ expansion increases performance (concurrency/throughput) by expanding query processing to newly added data nodes
  • 23. 23© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. NameNode HA Support • Feature: – Automatic failover to secondary NameNode when primary fails • Benefits: – Fully fault tolerant to NameNode failures – Improved availability of query processing – Integrated into Hadoop availability model
  • 24. 24© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. Error Table • Feature: – System table for storing non-conforming data • Benefits: – Eliminates erroneous data load – Reduces retries during load – Helps to debug errors in data structures
  • 25. 25© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. Parquet • Features: – Open storage format – Hybrid row/column open storage format – Configurable Parquet or AO/CO format support – Compression Type: Snappy and Gzip – Additional data type support – Parquet Input Format Reader API • Benefits: – Delivers significant performance and scalability improvements – Industry standard compression: Saves storage – Usable in MapReduce/Hive work loads
  • 26. 26© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. HAWQ Expansion • Features: – Expand HAWQ nodes to additional DataNodes – Expand # of segments per HAWQ segment host • Benefits: – Expand query processing – Increase performance by utilizing maximum CPU/resources – Increased concurrency/throughput
  • 27. 27© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. Big Computing and Graph Analytics • Open MPI is one of the most mature parallel computing frameworks now available within HDFS, eliminating costly data movement and shortening data science cycles • GraphLab is a graph-based library of machine learning algorithms – allowing Data Scientists and Analysts to leverage popular algorithms such as PageRank, collaborative filtering and computer vision in HDFS Open MPI
  • 28. 28© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. Background • Hadoop MapReduce is not a good fit for iterative applications (like graph computing, machine learning, etc.) • User needs to build separate system/clusters to support those applications • MPI is (one of) the most mature/used parallel computing frameworks – MPI = Big Computing, Hadoop = Big Data – MPI + Hadoop = Big Computing + Big Data
  • 29. 29© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. MPI Background • What is MPI? – “a standardized and portable message-passing system designed by a group of researchers from academia and industry to function on wide variety of parallel computers” Wikipedia • What is Open MPI? – One of the most popular implementations of MPI, community supported • What is Hamster? – “Hadoop And Mpi on the same cluSTER
  • 30. 30© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. GraphLab • Topic Modeling contains applications like LDA, which can be used to cluster documents and extract topical representations. • Graph Analytics contain applications like PageRank and triangle counting, which can be applied to general graphs to estimate community structure. • Clustering contains standard data clustering tools such as k-means • Collaborative Filtering contains a collection of applications used to make predictions about users interests and factorize large matrices. • Graphical Models contain tools for reasoning about structured noisy data. • Computer Vision contains a collection of tools for reasoning about images.
  • 31. 31© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. Pivotal HD: Built for Data Science Relational Advanced Analytics Data Science on Pivotal HD Graph Advanced Analytics SQL R Python Java Languages: Custom Analytic Functions - UDFs
  • 32. 32© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. World’s Leading Experts Pivotal Labs – Pivotal Data Labs On Demand Services Pivotal Data Dispatch BATCH BATCH INTERACTIVE INTERACTIVEHAWQGreenplum DB Unlimited Pivotal HD REAL-TIME REAL-TIMEGemFire XDGemFire | SQLFire
  • 33. 33© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. Pivotal Enables Hadoop Market Adoption Data Lakes Unify Unstructured and Structured Data Access Big Data Apps Build analytic and transaction-led applications impacting top line revenue Data-Driven Enterprise App Dev and Operational Management on HDFS Data Architecture
  • 34. 34© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. Pivotal HD 2.0 and Isilon Update • Isilon aligns with our Enterprise Grade Message • Pivotal Command Center 2.2 (part of Pivotal HD 2.0) – Works with Pivotal HD 1.1.1 – ‘Down’ status of HDFS is removed when Isilon is configured • Isilon has accelerated their integration from Q4 to Q3 for HDFS 2.2
  • 35. 35© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. Large Mid-Market Financier Builds Foundation to Store All Data of Interest, Convert Insights to Value- added Services Challenge: • Mid Market financier seeks to maintain high margins through value-added services • Realized that critical insights could come from many sources, but much was deleted due to storage cost • Frustrated by lack of ability to blend data fabric, build analytics on top, create applications on top of this. Solution: • Data Lake provides accessibility of any information of interest through familiar SQL-Like interface • Provide foundation for creation of Analytics and Applications as value added services: forecast demand based on social media sentiment, analytics on fleet vehicle usage
  • 36. 36© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. Major TV Network Replaces Teradata with Pivotal Builds Infrastructure to Capture $40 Million in Untapped Revenue Challenge: • Ad Inventory is an inherently perishable product, and is subject to inefficient, “traditional” selling process. • Upward trend in volume and traffic due to higher ad quality, mobile devices. • Inability to react: 7 hour lag time in communication between ad fulfillment and sales teams, this was exacerbated by major broadcast events. Solution: • Reduced 7-hour lag time to under 1 hour – enabling network sales to communicate delivered impressions, forecast spend inventory and sell more effectively • Maximized profit by selling across brands/channels – allowing network to better leverage non-premium inventory
  • 37. 37© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. Home Appliance Maker Lays Foundation for “Smart” Connected Devices, Big Data-based Decisions Challenge: • Prepare for next generation appliances: “smart” connected devices, controlled by mobile phone • Silo’ed environment including Teradata, SAS, HP made it difficult to derive true insights across disparate data Solution: • Enable Innovation, improve service performance through appliances that provide feedback based on output, environmental factors • Improve marketing efficiency with targeted campaigns based on market demographics, buying indicators • Better understand requirements for parts inventory based on current appliances lifecycle
  • 38. 38© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. National Healthcare Organization Replaces Aging IBM Platform, Seeds Data Lake as Hadoop Beachhead Challenge: • Aging IBM Infrastructure could not support new SAS Access and Visual Analytics Technology • Interest in enabling infrastructure to support for-profit healthcare analytics as a service business • Sought to provide refined data sets to other insurance companies for their own research, needed way to cleanse data Solution: • Stepwise evolution of platform onto GPDB, one of two certified platform partners for running visual analytics • Established data lake as platform for upload, cleansing and conversion of private data into publicly consumable datasets
  • 39. 39© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. Aviation: Predictive Maintenance Challenge: • An airplane’s comprehensive “gate to gate” flight data didn’t exist in a single place for reporting • Each individual flight can generate approximately 1 TB of data - economically infeasible in traditional EDW • To maintain profitability of GE Aviation's Contract Service Agreements, new analytical methods and approaches were required Solution: • Ingest all data to a data lake for data discovery and model development to increase wing time, greater aircraft uptime, improve customer satisfaction and airline profitability • Improved capacity for preventative maintenance rather than remediation, reducing expense and liability Pivotal Solution includes: GPDB, PHD, Alpine, Chorus
  • 40. 40© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. Brazilian Telco Provider Establishes Foundation for Data-Driven Culture Challenge: • Poor call quality caused massive loss of customers. No Insight into root cause of issues. • Increased scrutiny from regulators, but infrastructure did not support the requests for information needed • Difficulties with Scale: Call Data Record generates 2 Billion new records per day, no info on dropped calls due to capacity Solution: • New Data Warehouse infrastructure contains both dropped and completed calls for analysis, 3 month capacity • Hadoop infrastructure with familiar SQL interface stores 5x volume at half cost of Teradata • Reports which took 2 Months to obtain now take 1 day Pivotal Solution includes: PHD, HAWQ
  • 41. 41© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. Pivotal HD 2.0 Summary • The Foundation for Business Data Lake • The World’s Most Advanced Hadoop Stack – Pivotal HD now based on Apache 2.2 – Real-time SQL, in-memory over Pivotal HD and integrated into Spring: Pivotal GemFire XD – Enhanced Interactive SQL over Pivotal HD: HAWQ • World’s Most Advanced Big Data Analytic Platform – Most extensive set of machine learning libraries: MADlib, R and GraphLab
  • 42. 42© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. Pivotal HD 2.0 demo: PivotalBooth
  • 43. 43© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. Thank You