SlideShare ist ein Scribd-Unternehmen logo
1 von 12
1© Copyright 2013 Pivotal. All rights reserved. 1© Copyright 2013 Pivotal. All rights reserved.
A NEW PLATFORM FOR A NEW ERA
2Pivotal Confidential–Internal Use Only 2© Copyright 2013 Pivotal. All rights reserved.
Pivotal HD
3Pivotal Confidential–Internal Use Only
HDFS
HBase
Pig, Hive,
Mahout
Map Reduce
Sqoop Flume
Resource
Management
& Workflow
Yarn
Zookeeper
Apache Pivotal HD Added Value
Configure,
Deploy, Monitor,
Manage
Command
Center
Hadoop Virtualization (HVE)
Data Loader
Pivotal HD
Enterprise
Xtension
Framework
Catalog
Services
Query
Optimizer
Dynamic Pipelining
ANSI SQL + Analytics
HAWQ– Advanced
Database Services
Pivotal HD Architecture
4Pivotal Confidential–Internal Use Only
• HDFS – The Hadoop Distributed File System
acts as the storage layer for Hadoop
• MapReduce – Parallel processing framework
used for data computation in Hadoop
• Hive – Structured, data warehouse
implementation for data in HDFS that
provides a SQL-like interface to Hadoop
• Pig – High-level procedural language for data
pipeline/data flow processing in Hadoop
• HBase – NoSQL, key-value data store on top
of HDFS
• Mahout – Library of scalable machine-
learning Algorithms
• Spring Hadoop – Integrates the Spring
framework into Hadoop
Pivotal HD Components
5Pivotal Confidential–Internal Use Only
• Installation and Configuration Manager (ICM) – cluster
installation, upgrade, and expansion tools.
• GP Command Center – visual interface for cluster health,
system metrics, and job monitoring.
• Hadoop Virtualization Extension (HVE) – enhances
Hadoop to support virtual node awareness and enables
greater cluster elasticity.
• GP Data Loader – parallel loading infrastructure that
supports “line speed” data loading into HDFS.
• Isilon Integration – extensively tested at scale with
guidelines for compute-heavy, storage-heavy, and
balanced configurations.
• Advanced Database Services (HAWQ)– high-performance,
“True SQL” query interface running within the Hadoop cluster.
• Extensions Framework (GPXF) – support for HAWQ
interfaces on external data providers (HBase, Avro, etc.).
• Advanced Analytics Functions (MADLib) – ability to access
parallelized machine-learning and data-mining functions at
scale.
GPHD Includes… Pivotal HD Adds the Following to GPHD…
Pivotal HD Value-Added Components
6Pivotal Confidential–Internal Use Only
Component Version
Hadoop 1.0.3
HBase 0.92.1
Hive 0.8.1
Mahout 0.6
Pig 0.9.2
Zookeeper 3.3.5
Flume 1.2.0
Sqoop 1.4.1
Spring Hadoop
GPHD 1.2 Core Distribution Pivotal HD Enterprise
Pivotal Core Components & Versions
Component Version
Hadoop 2.0.2
HBase 0.94.2
Hive 0.9.1
Mahout 0.8.0
Pig 0.10.0
Zookeeper 3.4.5
Flume 1.3.1
Sqoop 1.4.2
Spring Hadoop 1.0.0
7Pivotal Confidential–Internal Use Only
DataLoader
.
.
Streams
Push
Pull
Connectors
Flume
HDFS
DataLoader
Data Source
Registration
Copy
Strategy
Optimization
Web GUI and CLI
Data
Destination
Registration
Data Copy
Job
Management
Data
Processing
REST APIs
Files
HDFS
NFS
HTTP
FTP
Local
8Pivotal Confidential–Internal Use Only
Command Center
Simple and complete cluster management
 Install and configure Hadoop
components and services
 Centralized interface for Pivotal HD
cluster monitoring, diagnostics, and
management
 Live and historical Hadoop system
metrics analysis
Configure
Monitor
Manage
Analyze
Deploy
9Pivotal Confidential–Internal Use Only
Command Center – Monitor, Manage, and
Analyze
 Host, application, and job level
monitoring across the entire Pivotal
HD cluster performance
 Visualize and analyze live and
historical Hadoop cluster information
through Command Center
Dashboard
 Quick diagnostics of functional or
performance issue
10Pivotal Confidential–Internal Use Only
Hadoop Virtualization Extensions (HVE)
• HVE enables Hadoop to support more effective virtual deployments
• This creates the opportunity to provision and scale the compute and storage processes
independently resulting in:
• Much better resource utilization
• Improved resource allocation and consumption
• Support Multi-Tenancy
11Pivotal Confidential–Internal Use Only
HAWQ Delivers
 SQL compliant
 World-class query optimizer
 Interactive query
 Horizontal scalability
 Robust data management
 Common Hadoop formats
 Deep analytics
12Pivotal Confidential–Internal Use Only
Xtension Framework
 An advanced version of GPDB
external tables
 Enables combining HAWQ data and
Hadoop data in single query
 Supports connectors for HDFS,
Hbase and Hive
 Provides extensible framework API to
enable custom connector
development for other data sources
HDFS HBase Hive
Xtension Framework

Weitere ähnliche Inhalte

Was ist angesagt?

MySql to HBase in 5 Steps
MySql to HBase in 5 StepsMySql to HBase in 5 Steps
MySql to HBase in 5 StepsScott Cinnamond
 
Architecting Applications with Hadoop
Architecting Applications with HadoopArchitecting Applications with Hadoop
Architecting Applications with Hadoopmarkgrover
 
A glimpse into the Future of Hadoop & Big Data
A glimpse into the Future of Hadoop & Big DataA glimpse into the Future of Hadoop & Big Data
A glimpse into the Future of Hadoop & Big DataSaurav Kumar Sinha
 
SQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for ImpalaSQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for Impalamarkgrover
 
Hadoop distributions - ecosystem
Hadoop distributions - ecosystemHadoop distributions - ecosystem
Hadoop distributions - ecosystemJakub Stransky
 
Introduction To Hadoop Administration - SpringPeople
Introduction To Hadoop Administration - SpringPeopleIntroduction To Hadoop Administration - SpringPeople
Introduction To Hadoop Administration - SpringPeopleSpringPeople
 
Hive on kafka
Hive on kafkaHive on kafka
Hive on kafkaSzehon Ho
 
The TCO Calculator - Estimate the True Cost of Hadoop
The TCO Calculator - Estimate the True Cost of Hadoop The TCO Calculator - Estimate the True Cost of Hadoop
The TCO Calculator - Estimate the True Cost of Hadoop MapR Technologies
 
Hadoop @ eBay: Past, Present, and Future
Hadoop @ eBay: Past, Present, and FutureHadoop @ eBay: Past, Present, and Future
Hadoop @ eBay: Past, Present, and FutureRyan Hennig
 
HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...
HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...
HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...Cloudera, Inc.
 
Hive LLAP: A High Performance, Cost-effective Alternative to Traditional MPP ...
Hive LLAP: A High Performance, Cost-effective Alternative to Traditional MPP ...Hive LLAP: A High Performance, Cost-effective Alternative to Traditional MPP ...
Hive LLAP: A High Performance, Cost-effective Alternative to Traditional MPP ...DataWorks Summit
 
Hadoop from Hive with Stinger to Tez
Hadoop from Hive with Stinger to TezHadoop from Hive with Stinger to Tez
Hadoop from Hive with Stinger to TezJan Pieter Posthuma
 
Stinger Initiative - Deep Dive
Stinger Initiative - Deep DiveStinger Initiative - Deep Dive
Stinger Initiative - Deep DiveHortonworks
 
Hive at Yahoo: Letters from the trenches
Hive at Yahoo: Letters from the trenchesHive at Yahoo: Letters from the trenches
Hive at Yahoo: Letters from the trenchesDataWorks Summit
 

Was ist angesagt? (20)

MySql to HBase in 5 Steps
MySql to HBase in 5 StepsMySql to HBase in 5 Steps
MySql to HBase in 5 Steps
 
Architecting Applications with Hadoop
Architecting Applications with HadoopArchitecting Applications with Hadoop
Architecting Applications with Hadoop
 
A glimpse into the Future of Hadoop & Big Data
A glimpse into the Future of Hadoop & Big DataA glimpse into the Future of Hadoop & Big Data
A glimpse into the Future of Hadoop & Big Data
 
SQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for ImpalaSQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for Impala
 
Hadoop distributions - ecosystem
Hadoop distributions - ecosystemHadoop distributions - ecosystem
Hadoop distributions - ecosystem
 
Introduction To Hadoop Administration - SpringPeople
Introduction To Hadoop Administration - SpringPeopleIntroduction To Hadoop Administration - SpringPeople
Introduction To Hadoop Administration - SpringPeople
 
Hadoop in a Nutshell
Hadoop in a NutshellHadoop in a Nutshell
Hadoop in a Nutshell
 
Deep Learning using Spark and DL4J for fun and profit
Deep Learning using Spark and DL4J for fun and profitDeep Learning using Spark and DL4J for fun and profit
Deep Learning using Spark and DL4J for fun and profit
 
Hive on kafka
Hive on kafkaHive on kafka
Hive on kafka
 
Empower Data-Driven Organizations
Empower Data-Driven OrganizationsEmpower Data-Driven Organizations
Empower Data-Driven Organizations
 
Empower Hive with Spark
Empower Hive with SparkEmpower Hive with Spark
Empower Hive with Spark
 
The TCO Calculator - Estimate the True Cost of Hadoop
The TCO Calculator - Estimate the True Cost of Hadoop The TCO Calculator - Estimate the True Cost of Hadoop
The TCO Calculator - Estimate the True Cost of Hadoop
 
Hadoop @ eBay: Past, Present, and Future
Hadoop @ eBay: Past, Present, and FutureHadoop @ eBay: Past, Present, and Future
Hadoop @ eBay: Past, Present, and Future
 
HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...
HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...
HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...
 
Envelope
Envelope Envelope
Envelope
 
Hive LLAP: A High Performance, Cost-effective Alternative to Traditional MPP ...
Hive LLAP: A High Performance, Cost-effective Alternative to Traditional MPP ...Hive LLAP: A High Performance, Cost-effective Alternative to Traditional MPP ...
Hive LLAP: A High Performance, Cost-effective Alternative to Traditional MPP ...
 
Hive Now Sparks
Hive Now SparksHive Now Sparks
Hive Now Sparks
 
Hadoop from Hive with Stinger to Tez
Hadoop from Hive with Stinger to TezHadoop from Hive with Stinger to Tez
Hadoop from Hive with Stinger to Tez
 
Stinger Initiative - Deep Dive
Stinger Initiative - Deep DiveStinger Initiative - Deep Dive
Stinger Initiative - Deep Dive
 
Hive at Yahoo: Letters from the trenches
Hive at Yahoo: Letters from the trenchesHive at Yahoo: Letters from the trenches
Hive at Yahoo: Letters from the trenches
 

Andere mochten auch

Caminho para definir micro-serviços
Caminho para definir micro-serviçosCaminho para definir micro-serviços
Caminho para definir micro-serviçosVictor Fonseca
 
Greenplum Database Open Source December 2015
Greenplum Database Open Source December 2015Greenplum Database Open Source December 2015
Greenplum Database Open Source December 2015PivotalOpenSourceHub
 
#PostgreSQLRussia в банке Тинькофф, доклад №1
#PostgreSQLRussia в банке Тинькофф, доклад №1#PostgreSQLRussia в банке Тинькофф, доклад №1
#PostgreSQLRussia в банке Тинькофф, доклад №1Nikolay Samokhvalov
 
Introduction to Greenplum
Introduction to GreenplumIntroduction to Greenplum
Introduction to GreenplumDave Cramer
 
Greenplum: O banco de dados open source massivamente paralelo baseado em Post...
Greenplum: O banco de dados open source massivamente paralelo baseado em Post...Greenplum: O banco de dados open source massivamente paralelo baseado em Post...
Greenplum: O banco de dados open source massivamente paralelo baseado em Post...PGDay Campinas
 
Cloud-native Data: Every Microservice Needs a Cache
Cloud-native Data: Every Microservice Needs a CacheCloud-native Data: Every Microservice Needs a Cache
Cloud-native Data: Every Microservice Needs a Cachecornelia davis
 
White Paper: Monitoring EMC Greenplum DCA with Nagios - EMC Greenplum Data Co...
White Paper: Monitoring EMC Greenplum DCA with Nagios - EMC Greenplum Data Co...White Paper: Monitoring EMC Greenplum DCA with Nagios - EMC Greenplum Data Co...
White Paper: Monitoring EMC Greenplum DCA with Nagios - EMC Greenplum Data Co...EMC
 

Andere mochten auch (8)

Caminho para definir micro-serviços
Caminho para definir micro-serviçosCaminho para definir micro-serviços
Caminho para definir micro-serviços
 
Greenplum Database Open Source December 2015
Greenplum Database Open Source December 2015Greenplum Database Open Source December 2015
Greenplum Database Open Source December 2015
 
#PostgreSQLRussia в банке Тинькофф, доклад №1
#PostgreSQLRussia в банке Тинькофф, доклад №1#PostgreSQLRussia в банке Тинькофф, доклад №1
#PostgreSQLRussia в банке Тинькофф, доклад №1
 
Introduction to Greenplum
Introduction to GreenplumIntroduction to Greenplum
Introduction to Greenplum
 
Greenplum: O banco de dados open source massivamente paralelo baseado em Post...
Greenplum: O banco de dados open source massivamente paralelo baseado em Post...Greenplum: O banco de dados open source massivamente paralelo baseado em Post...
Greenplum: O banco de dados open source massivamente paralelo baseado em Post...
 
Greenplum Architecture
Greenplum ArchitectureGreenplum Architecture
Greenplum Architecture
 
Cloud-native Data: Every Microservice Needs a Cache
Cloud-native Data: Every Microservice Needs a CacheCloud-native Data: Every Microservice Needs a Cache
Cloud-native Data: Every Microservice Needs a Cache
 
White Paper: Monitoring EMC Greenplum DCA with Nagios - EMC Greenplum Data Co...
White Paper: Monitoring EMC Greenplum DCA with Nagios - EMC Greenplum Data Co...White Paper: Monitoring EMC Greenplum DCA with Nagios - EMC Greenplum Data Co...
White Paper: Monitoring EMC Greenplum DCA with Nagios - EMC Greenplum Data Co...
 

Ähnlich wie Pivotal HD Platform Overview

Pivotal HAWQ 소개
Pivotal HAWQ 소개Pivotal HAWQ 소개
Pivotal HAWQ 소개Seungdon Choi
 
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune Big-Data Hadoop Tutorials - MindScripts Technologies, Pune
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune amrutupre
 
Overview of big data & hadoop version 1 - Tony Nguyen
Overview of big data & hadoop   version 1 - Tony NguyenOverview of big data & hadoop   version 1 - Tony Nguyen
Overview of big data & hadoop version 1 - Tony NguyenThanh Nguyen
 
Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Thanh Nguyen
 
Big Data in the Microsoft Platform
Big Data in the Microsoft PlatformBig Data in the Microsoft Platform
Big Data in the Microsoft PlatformJesus Rodriguez
 
Hadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHitendra Kumar
 
Eric Baldeschwieler Keynote from Storage Developers Conference
Eric Baldeschwieler Keynote from Storage Developers ConferenceEric Baldeschwieler Keynote from Storage Developers Conference
Eric Baldeschwieler Keynote from Storage Developers ConferenceHortonworks
 
Hadoop And Their Ecosystem ppt
 Hadoop And Their Ecosystem ppt Hadoop And Their Ecosystem ppt
Hadoop And Their Ecosystem pptsunera pathan
 
Hadoop And Their Ecosystem
 Hadoop And Their Ecosystem Hadoop And Their Ecosystem
Hadoop And Their Ecosystemsunera pathan
 
Hadoop in the Cloud - The what, why and how from the experts
Hadoop in the Cloud - The what, why and how from the expertsHadoop in the Cloud - The what, why and how from the experts
Hadoop in the Cloud - The what, why and how from the expertsDataWorks Summit/Hadoop Summit
 
hadoop-ecosystem-ppt.pptx
hadoop-ecosystem-ppt.pptxhadoop-ecosystem-ppt.pptx
hadoop-ecosystem-ppt.pptxraghavanand36
 
M. Florence Dayana - Hadoop Foundation for Analytics.pptx
M. Florence Dayana - Hadoop Foundation for Analytics.pptxM. Florence Dayana - Hadoop Foundation for Analytics.pptx
M. Florence Dayana - Hadoop Foundation for Analytics.pptxDr.Florence Dayana
 
An Introduction-to-Hive and its Applications and Implementations.pptx
An Introduction-to-Hive and its Applications and Implementations.pptxAn Introduction-to-Hive and its Applications and Implementations.pptx
An Introduction-to-Hive and its Applications and Implementations.pptxiaeronlineexm
 
Scalable ETL with Talend and Hadoop, Cédric Carbone, Talend.
Scalable ETL with Talend and Hadoop, Cédric Carbone, Talend.Scalable ETL with Talend and Hadoop, Cédric Carbone, Talend.
Scalable ETL with Talend and Hadoop, Cédric Carbone, Talend.OW2
 
Hadoop - Looking to the Future By Arun Murthy
Hadoop - Looking to the Future By Arun MurthyHadoop - Looking to the Future By Arun Murthy
Hadoop - Looking to the Future By Arun Murthyhuguk
 
01-Introduction-to-Hive.pptx
01-Introduction-to-Hive.pptx01-Introduction-to-Hive.pptx
01-Introduction-to-Hive.pptxVIJAYAPRABAP
 

Ähnlich wie Pivotal HD Platform Overview (20)

Pivotal HAWQ 소개
Pivotal HAWQ 소개Pivotal HAWQ 소개
Pivotal HAWQ 소개
 
What is hadoop
What is hadoopWhat is hadoop
What is hadoop
 
BIGDATA ppts
BIGDATA pptsBIGDATA ppts
BIGDATA ppts
 
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune Big-Data Hadoop Tutorials - MindScripts Technologies, Pune
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune
 
Overview of big data & hadoop version 1 - Tony Nguyen
Overview of big data & hadoop   version 1 - Tony NguyenOverview of big data & hadoop   version 1 - Tony Nguyen
Overview of big data & hadoop version 1 - Tony Nguyen
 
Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1
 
Big Data in the Microsoft Platform
Big Data in the Microsoft PlatformBig Data in the Microsoft Platform
Big Data in the Microsoft Platform
 
Hadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log Processing
 
Azure Big data
Azure Big data Azure Big data
Azure Big data
 
Eric Baldeschwieler Keynote from Storage Developers Conference
Eric Baldeschwieler Keynote from Storage Developers ConferenceEric Baldeschwieler Keynote from Storage Developers Conference
Eric Baldeschwieler Keynote from Storage Developers Conference
 
Hadoop And Their Ecosystem ppt
 Hadoop And Their Ecosystem ppt Hadoop And Their Ecosystem ppt
Hadoop And Their Ecosystem ppt
 
Hadoop And Their Ecosystem
 Hadoop And Their Ecosystem Hadoop And Their Ecosystem
Hadoop And Their Ecosystem
 
Hadoop in the Cloud - The what, why and how from the experts
Hadoop in the Cloud - The what, why and how from the expertsHadoop in the Cloud - The what, why and how from the experts
Hadoop in the Cloud - The what, why and how from the experts
 
hadoop-ecosystem-ppt.pptx
hadoop-ecosystem-ppt.pptxhadoop-ecosystem-ppt.pptx
hadoop-ecosystem-ppt.pptx
 
M. Florence Dayana - Hadoop Foundation for Analytics.pptx
M. Florence Dayana - Hadoop Foundation for Analytics.pptxM. Florence Dayana - Hadoop Foundation for Analytics.pptx
M. Florence Dayana - Hadoop Foundation for Analytics.pptx
 
An Introduction-to-Hive and its Applications and Implementations.pptx
An Introduction-to-Hive and its Applications and Implementations.pptxAn Introduction-to-Hive and its Applications and Implementations.pptx
An Introduction-to-Hive and its Applications and Implementations.pptx
 
Scalable ETL with Talend and Hadoop, Cédric Carbone, Talend.
Scalable ETL with Talend and Hadoop, Cédric Carbone, Talend.Scalable ETL with Talend and Hadoop, Cédric Carbone, Talend.
Scalable ETL with Talend and Hadoop, Cédric Carbone, Talend.
 
Hadoop - Looking to the Future By Arun Murthy
Hadoop - Looking to the Future By Arun MurthyHadoop - Looking to the Future By Arun Murthy
Hadoop - Looking to the Future By Arun Murthy
 
01-Introduction-to-Hive.pptx
01-Introduction-to-Hive.pptx01-Introduction-to-Hive.pptx
01-Introduction-to-Hive.pptx
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
 

Mehr von Chiou-Nan Chen

Mehr von Chiou-Nan Chen (20)

Moving NEON to 64 bits
Moving NEON to 64 bitsMoving NEON to 64 bits
Moving NEON to 64 bits
 
64-bit Android
64-bit Android64-bit Android
64-bit Android
 
Intelligent Power Allocation
Intelligent Power AllocationIntelligent Power Allocation
Intelligent Power Allocation
 
3. v sphere big data extensions
3. v sphere big data extensions3. v sphere big data extensions
3. v sphere big data extensions
 
4. v sphere big data extensions hadoop
4. v sphere big data extensions   hadoop4. v sphere big data extensions   hadoop
4. v sphere big data extensions hadoop
 
2. hadoop
2. hadoop2. hadoop
2. hadoop
 
1. beyond mission critical virtualizing big data and hadoop
1. beyond mission critical   virtualizing big data and hadoop1. beyond mission critical   virtualizing big data and hadoop
1. beyond mission critical virtualizing big data and hadoop
 
Emc keynote 1130 1200
Emc keynote 1130 1200Emc keynote 1130 1200
Emc keynote 1130 1200
 
Emc keynote 1030 1130
Emc keynote 1030 1130Emc keynote 1030 1130
Emc keynote 1030 1130
 
Emc keynote 0945 1030
Emc keynote 0945 1030Emc keynote 0945 1030
Emc keynote 0945 1030
 
Emc keynote 0930 0945
Emc keynote 0930 0945Emc keynote 0930 0945
Emc keynote 0930 0945
 
102 1600-1630
102 1600-1630102 1600-1630
102 1600-1630
 
102 1530-1600
102 1530-1600102 1530-1600
102 1530-1600
 
102 1430-1445
102 1430-1445102 1430-1445
102 1430-1445
 
102 1315-1345
102 1315-1345102 1315-1345
102 1315-1345
 
102 1630 1700
102 1630 1700102 1630 1700
102 1630 1700
 
102 1445 1515
102 1445 1515102 1445 1515
102 1445 1515
 
101 cd 1630-1700
101 cd 1630-1700101 cd 1630-1700
101 cd 1630-1700
 
101 cd 1600-1630
101 cd 1600-1630101 cd 1600-1630
101 cd 1600-1630
 
101 cd 1445-1515
101 cd 1445-1515101 cd 1445-1515
101 cd 1445-1515
 

Kürzlich hochgeladen

Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 

Kürzlich hochgeladen (20)

Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 

Pivotal HD Platform Overview

  • 1. 1© Copyright 2013 Pivotal. All rights reserved. 1© Copyright 2013 Pivotal. All rights reserved. A NEW PLATFORM FOR A NEW ERA
  • 2. 2Pivotal Confidential–Internal Use Only 2© Copyright 2013 Pivotal. All rights reserved. Pivotal HD
  • 3. 3Pivotal Confidential–Internal Use Only HDFS HBase Pig, Hive, Mahout Map Reduce Sqoop Flume Resource Management & Workflow Yarn Zookeeper Apache Pivotal HD Added Value Configure, Deploy, Monitor, Manage Command Center Hadoop Virtualization (HVE) Data Loader Pivotal HD Enterprise Xtension Framework Catalog Services Query Optimizer Dynamic Pipelining ANSI SQL + Analytics HAWQ– Advanced Database Services Pivotal HD Architecture
  • 4. 4Pivotal Confidential–Internal Use Only • HDFS – The Hadoop Distributed File System acts as the storage layer for Hadoop • MapReduce – Parallel processing framework used for data computation in Hadoop • Hive – Structured, data warehouse implementation for data in HDFS that provides a SQL-like interface to Hadoop • Pig – High-level procedural language for data pipeline/data flow processing in Hadoop • HBase – NoSQL, key-value data store on top of HDFS • Mahout – Library of scalable machine- learning Algorithms • Spring Hadoop – Integrates the Spring framework into Hadoop Pivotal HD Components
  • 5. 5Pivotal Confidential–Internal Use Only • Installation and Configuration Manager (ICM) – cluster installation, upgrade, and expansion tools. • GP Command Center – visual interface for cluster health, system metrics, and job monitoring. • Hadoop Virtualization Extension (HVE) – enhances Hadoop to support virtual node awareness and enables greater cluster elasticity. • GP Data Loader – parallel loading infrastructure that supports “line speed” data loading into HDFS. • Isilon Integration – extensively tested at scale with guidelines for compute-heavy, storage-heavy, and balanced configurations. • Advanced Database Services (HAWQ)– high-performance, “True SQL” query interface running within the Hadoop cluster. • Extensions Framework (GPXF) – support for HAWQ interfaces on external data providers (HBase, Avro, etc.). • Advanced Analytics Functions (MADLib) – ability to access parallelized machine-learning and data-mining functions at scale. GPHD Includes… Pivotal HD Adds the Following to GPHD… Pivotal HD Value-Added Components
  • 6. 6Pivotal Confidential–Internal Use Only Component Version Hadoop 1.0.3 HBase 0.92.1 Hive 0.8.1 Mahout 0.6 Pig 0.9.2 Zookeeper 3.3.5 Flume 1.2.0 Sqoop 1.4.1 Spring Hadoop GPHD 1.2 Core Distribution Pivotal HD Enterprise Pivotal Core Components & Versions Component Version Hadoop 2.0.2 HBase 0.94.2 Hive 0.9.1 Mahout 0.8.0 Pig 0.10.0 Zookeeper 3.4.5 Flume 1.3.1 Sqoop 1.4.2 Spring Hadoop 1.0.0
  • 7. 7Pivotal Confidential–Internal Use Only DataLoader . . Streams Push Pull Connectors Flume HDFS DataLoader Data Source Registration Copy Strategy Optimization Web GUI and CLI Data Destination Registration Data Copy Job Management Data Processing REST APIs Files HDFS NFS HTTP FTP Local
  • 8. 8Pivotal Confidential–Internal Use Only Command Center Simple and complete cluster management  Install and configure Hadoop components and services  Centralized interface for Pivotal HD cluster monitoring, diagnostics, and management  Live and historical Hadoop system metrics analysis Configure Monitor Manage Analyze Deploy
  • 9. 9Pivotal Confidential–Internal Use Only Command Center – Monitor, Manage, and Analyze  Host, application, and job level monitoring across the entire Pivotal HD cluster performance  Visualize and analyze live and historical Hadoop cluster information through Command Center Dashboard  Quick diagnostics of functional or performance issue
  • 10. 10Pivotal Confidential–Internal Use Only Hadoop Virtualization Extensions (HVE) • HVE enables Hadoop to support more effective virtual deployments • This creates the opportunity to provision and scale the compute and storage processes independently resulting in: • Much better resource utilization • Improved resource allocation and consumption • Support Multi-Tenancy
  • 11. 11Pivotal Confidential–Internal Use Only HAWQ Delivers  SQL compliant  World-class query optimizer  Interactive query  Horizontal scalability  Robust data management  Common Hadoop formats  Deep analytics
  • 12. 12Pivotal Confidential–Internal Use Only Xtension Framework  An advanced version of GPDB external tables  Enables combining HAWQ data and Hadoop data in single query  Supports connectors for HDFS, Hbase and Hive  Provides extensible framework API to enable custom connector development for other data sources HDFS HBase Hive Xtension Framework

Hinweis der Redaktion

  1. Start with basic HD and then comment about the addition of a true SQL interface
  2. Uniform – Uniformly distribute copy tasks between workers for to maximize throughput Data Locality Driven – For HDFS/Local Disk sources. Job planner assigns Copy Tasks to Local / closest Worker Node. When deployed on source, assigns reads to local worker, when deployed to destination HDFS, assigns writes to local nodes. Job Planner gets locality information from NameNode. Patches to HDFS schedulers: HDFS rack awareness to reduce inter-rack traffic Local disk awareness to assign read/write MapReduce tasks to workers local to the data Dynamic – Used for loading large number of small files. Assigns small tasks to workers, re-assigns in runtime Connection Limited – Limit # read connections to source FTP/HTTP server Intelligent – Choose correct copy strategy based on source type and data
  3. ICM What is it – GPHD Manager for Greenplum HD GPHD Manager is a part of Command center package. GPHD Manager supports installation and default configuration of Hadoop, MapReduce, Hive, HBase, Zookeeper, Pig and Mahout GPHD Manager supports a Command Line Interface built using a RESTful web services API to install, configure and start/stop various Hadoop services It stores all metadata from the Hadoop cluster nodes and services into a postgresql db to keep track of cluster config and runtime stats How it works GPHD manager is installed on an Admin node that is typically separated from Hadoop cluster nodes Functionality of GPHD manager Admin node exposed as web service REST based APIs Leverages Puppet to manage the installation of Hadoop services (Master/Slave mode) Benefits Provides a centralized role-based configuration and deployment tool Includes validation – machine validation, reachability validation, dependency validation Single GPHD manager Admin node can manage multiple Hadoop clusters (integrated into GP Command Center in next release) Command Center What is it – Application for monitoring & management of GPHD environment Web-based interface that provides standard infrastructure system metrics and Hadoop-specific metrics Designed to make a Hadoop Administrator’s job easier Command Center How it works Visualizes live and historical data in GPCC Dashboard to display state of Hadoop cluster (stored in backend GPDB) GPCC Provides: Host level monitoring (all information specific to a particular host) Application level monitoring (HDFS information across the whole cluster) Job-level monitoring and analysis (information on particular MapReduce jobs) Benefits Monitor that the GPHD cluster is running efficiently and without any problems Quickly diagnose functional or performance issues with the Hadoop cluster GPSM What is it – GPHD System Management & Monitoring Web-services component of GPHD 1.2 that allows for applications to easily monitor and manage one or more Hadoop clusters GP-SM is designed to work with Greenplum Command Center as the UI Leverages GPDB to store/analyze both GPHD application and system metrics How it works GP-SM provides a Thrift Plugin to retrieve data from GPHD Live and historical data stored in GPDB instance with pre-defined schema (gpperfmon) Thrift Plugin exposes its APIs via Web Services GPCC uses this Web Service to visualize live and historical data of GPHD environment Benefits Serves as the backend system for Greenplum Command Center Enables users to analyze both live and historical Hadoop system information
  4. Command Center What is it – Application for monitoring & management of GPHD environment Web-based interface that provides standard infrastructure system metrics and Hadoop-specific metrics Designed to make a Hadoop Administrator’s job easier Command Center How it works Visualizes live and historical data in GPCC Dashboard to display state of Hadoop cluster (stored in backend GPDB) GPCC Provides: Host level monitoring (all information specific to a particular host) Application level monitoring (HDFS information across the whole cluster) Job-level monitoring and analysis (information on particular MapReduce jobs) Benefits Monitor that the GPHD cluster is running efficiently and without any problems Quickly diagnose functional or performance issues with the Hadoop cluster GPSM What is it – GPHD System Management & Monitoring Web-services component of GPHD 1.2 that allows for applications to easily monitor and manage one or more Hadoop clusters GP-SM is designed to work with Greenplum Command Center as the UI Leverages GPDB to store/analyze both GPHD application and system metrics How it works GP-SM provides a Thrift Plugin to retrieve data from GPHD Live and historical data stored in GPDB instance with pre-defined schema (gpperfmon) Thrift Plugin exposes its APIs via Web Services GPCC uses this Web Service to visualize live and historical data of GPHD environment Benefits Serves as the backend system for Greenplum Command Center Enables users to analyze both live and historical Hadoop system information
  5. Topology Extensions: Enable Hadoop to recognize additional virtualization layer for read/write/balancing Enable compute/data node separation without losing locality Elasticity Extensions: Ability to adjust resource allocation (CPU, memory, map/reduce slots) to compute nodes Enables multiple compute VMs sharing common HDFS Data VMs HVE – Allow the HDFS to be virtualization aware Serengeti – Deployment tool for virtualized hadoop
  6. This is the first true SQL engine for hadoop
  7. HDFS Delimited Text Sequence File GPDB Writable Format Protocol Buffer Avro Hbase Predicate Pushdown Hive RCFile Text File Sequence File