Suche senden
Hochladen
Hadoop for shanghai dev meetup
•
4 gefällt mir
•
1,040 views
Roby Chen
Folgen
Technologie
Business
Melden
Teilen
Melden
Teilen
1 von 24
Jetzt herunterladen
Downloaden Sie, um offline zu lesen
Empfohlen
Drill njhug -19 feb2013
Drill njhug -19 feb2013
MapR Technologies
Building Big Data Applications
Building Big Data Applications
Richard McDougall
Self-Service Access and Exploration of Big Data
Self-Service Access and Exploration of Big Data
Inside Analysis
Hadoop - Now, Next and Beyond
Hadoop - Now, Next and Beyond
Teradata Aster
Hadoop as Data Refinery - Steve Loughran
Hadoop as Data Refinery - Steve Loughran
JAX London
Hadoop as data refinery
Hadoop as data refinery
Steve Loughran
Kognitio overview jan 2013
Kognitio overview jan 2013
Michael Hiskey
HugeTable:Application-Oriented Structure Data Storage System
HugeTable:Application-Oriented Structure Data Storage System
qlw5
Empfohlen
Drill njhug -19 feb2013
Drill njhug -19 feb2013
MapR Technologies
Building Big Data Applications
Building Big Data Applications
Richard McDougall
Self-Service Access and Exploration of Big Data
Self-Service Access and Exploration of Big Data
Inside Analysis
Hadoop - Now, Next and Beyond
Hadoop - Now, Next and Beyond
Teradata Aster
Hadoop as Data Refinery - Steve Loughran
Hadoop as Data Refinery - Steve Loughran
JAX London
Hadoop as data refinery
Hadoop as data refinery
Steve Loughran
Kognitio overview jan 2013
Kognitio overview jan 2013
Michael Hiskey
HugeTable:Application-Oriented Structure Data Storage System
HugeTable:Application-Oriented Structure Data Storage System
qlw5
Hadoop Data Reservoir Webinar
Hadoop Data Reservoir Webinar
Platfora
Hadoop World 2011: Building Scalable Data Platforms ; Hadoop & Netezza Deploy...
Hadoop World 2011: Building Scalable Data Platforms ; Hadoop & Netezza Deploy...
Krishnan Parasuraman
Integrating Hadoop Into the Enterprise – Hadoop Summit 2012
Integrating Hadoop Into the Enterprise – Hadoop Summit 2012
Jonathan Seidman
Using hadoop to expand data warehousing
Using hadoop to expand data warehousing
DataWorks Summit
Agile analytics applications on hadoop
Agile analytics applications on hadoop
Hortonworks
hadoop @ Ibmbigdata
hadoop @ Ibmbigdata
Eric Baldeschwieler
Hadoop - Where did it come from and what's next? (Pasadena Sept 2014)
Hadoop - Where did it come from and what's next? (Pasadena Sept 2014)
Eric Baldeschwieler
Hw09 Data Processing In The Enterprise
Hw09 Data Processing In The Enterprise
Cloudera, Inc.
Introduction to Hortonworks Data Platform for Windows
Introduction to Hortonworks Data Platform for Windows
Hortonworks
Linking Data and Actions on the Web
Linking Data and Actions on the Web
Stuart Charlton
I'll See You On the Write Side of the Web
I'll See You On the Write Side of the Web
Stuart Charlton
Greenplum hadoop
Greenplum hadoop
Chiou-Nan Chen
Introduction to Microsoft HDInsight and BI Tools
Introduction to Microsoft HDInsight and BI Tools
DataWorks Summit
Big data, map reduce and beyond
Big data, map reduce and beyond
datasalt
Emergent Distributed Data Storage
Emergent Distributed Data Storage
hybrid cloud
Tajo_Meetup_20141120
Tajo_Meetup_20141120
Hyoungjun Kim
Hadoop summit EU - Crowd Sourcing Reflected Intelligence
Hadoop summit EU - Crowd Sourcing Reflected Intelligence
Ted Dunning
Demonstrating the Future of Data Science
Demonstrating the Future of Data Science
greenplum
Hortonworks and Red Hat Webinar - Part 2
Hortonworks and Red Hat Webinar - Part 2
Hortonworks
Complex Er[jl]ang Processing with StreamBase
Complex Er[jl]ang Processing with StreamBase
darach
WANA GROUP agence full services de talents digitaux + de 180 collaborateurs
WANA GROUP agence full services de talents digitaux + de 180 collaborateurs
Aurélien Malo
Conférence "le big data en entreprise" de René Lefébure lors de l'évènement ...
Conférence "le big data en entreprise" de René Lefébure lors de l'évènement ...
WANA GROUP
Weitere ähnliche Inhalte
Was ist angesagt?
Hadoop Data Reservoir Webinar
Hadoop Data Reservoir Webinar
Platfora
Hadoop World 2011: Building Scalable Data Platforms ; Hadoop & Netezza Deploy...
Hadoop World 2011: Building Scalable Data Platforms ; Hadoop & Netezza Deploy...
Krishnan Parasuraman
Integrating Hadoop Into the Enterprise – Hadoop Summit 2012
Integrating Hadoop Into the Enterprise – Hadoop Summit 2012
Jonathan Seidman
Using hadoop to expand data warehousing
Using hadoop to expand data warehousing
DataWorks Summit
Agile analytics applications on hadoop
Agile analytics applications on hadoop
Hortonworks
hadoop @ Ibmbigdata
hadoop @ Ibmbigdata
Eric Baldeschwieler
Hadoop - Where did it come from and what's next? (Pasadena Sept 2014)
Hadoop - Where did it come from and what's next? (Pasadena Sept 2014)
Eric Baldeschwieler
Hw09 Data Processing In The Enterprise
Hw09 Data Processing In The Enterprise
Cloudera, Inc.
Introduction to Hortonworks Data Platform for Windows
Introduction to Hortonworks Data Platform for Windows
Hortonworks
Linking Data and Actions on the Web
Linking Data and Actions on the Web
Stuart Charlton
I'll See You On the Write Side of the Web
I'll See You On the Write Side of the Web
Stuart Charlton
Greenplum hadoop
Greenplum hadoop
Chiou-Nan Chen
Introduction to Microsoft HDInsight and BI Tools
Introduction to Microsoft HDInsight and BI Tools
DataWorks Summit
Big data, map reduce and beyond
Big data, map reduce and beyond
datasalt
Emergent Distributed Data Storage
Emergent Distributed Data Storage
hybrid cloud
Tajo_Meetup_20141120
Tajo_Meetup_20141120
Hyoungjun Kim
Hadoop summit EU - Crowd Sourcing Reflected Intelligence
Hadoop summit EU - Crowd Sourcing Reflected Intelligence
Ted Dunning
Demonstrating the Future of Data Science
Demonstrating the Future of Data Science
greenplum
Hortonworks and Red Hat Webinar - Part 2
Hortonworks and Red Hat Webinar - Part 2
Hortonworks
Complex Er[jl]ang Processing with StreamBase
Complex Er[jl]ang Processing with StreamBase
darach
Was ist angesagt?
(20)
Hadoop Data Reservoir Webinar
Hadoop Data Reservoir Webinar
Hadoop World 2011: Building Scalable Data Platforms ; Hadoop & Netezza Deploy...
Hadoop World 2011: Building Scalable Data Platforms ; Hadoop & Netezza Deploy...
Integrating Hadoop Into the Enterprise – Hadoop Summit 2012
Integrating Hadoop Into the Enterprise – Hadoop Summit 2012
Using hadoop to expand data warehousing
Using hadoop to expand data warehousing
Agile analytics applications on hadoop
Agile analytics applications on hadoop
hadoop @ Ibmbigdata
hadoop @ Ibmbigdata
Hadoop - Where did it come from and what's next? (Pasadena Sept 2014)
Hadoop - Where did it come from and what's next? (Pasadena Sept 2014)
Hw09 Data Processing In The Enterprise
Hw09 Data Processing In The Enterprise
Introduction to Hortonworks Data Platform for Windows
Introduction to Hortonworks Data Platform for Windows
Linking Data and Actions on the Web
Linking Data and Actions on the Web
I'll See You On the Write Side of the Web
I'll See You On the Write Side of the Web
Greenplum hadoop
Greenplum hadoop
Introduction to Microsoft HDInsight and BI Tools
Introduction to Microsoft HDInsight and BI Tools
Big data, map reduce and beyond
Big data, map reduce and beyond
Emergent Distributed Data Storage
Emergent Distributed Data Storage
Tajo_Meetup_20141120
Tajo_Meetup_20141120
Hadoop summit EU - Crowd Sourcing Reflected Intelligence
Hadoop summit EU - Crowd Sourcing Reflected Intelligence
Demonstrating the Future of Data Science
Demonstrating the Future of Data Science
Hortonworks and Red Hat Webinar - Part 2
Hortonworks and Red Hat Webinar - Part 2
Complex Er[jl]ang Processing with StreamBase
Complex Er[jl]ang Processing with StreamBase
Andere mochten auch
WANA GROUP agence full services de talents digitaux + de 180 collaborateurs
WANA GROUP agence full services de talents digitaux + de 180 collaborateurs
Aurélien Malo
Conférence "le big data en entreprise" de René Lefébure lors de l'évènement ...
Conférence "le big data en entreprise" de René Lefébure lors de l'évènement ...
WANA GROUP
JSS2014 – Le grand tour de Power BI
JSS2014 – Le grand tour de Power BI
GUSS
La Data, levier pour personnaliser sa relation client
La Data, levier pour personnaliser sa relation client
Hassan Lâasri
Les secrets d'un bon tableau de bord excel
Les secrets d'un bon tableau de bord excel
Sophie Marchand, M.Sc., CPA, CGA, MVP
10 minutes : Tableaux de bord
10 minutes : Tableaux de bord
Converteo
Andere mochten auch
(6)
WANA GROUP agence full services de talents digitaux + de 180 collaborateurs
WANA GROUP agence full services de talents digitaux + de 180 collaborateurs
Conférence "le big data en entreprise" de René Lefébure lors de l'évènement ...
Conférence "le big data en entreprise" de René Lefébure lors de l'évènement ...
JSS2014 – Le grand tour de Power BI
JSS2014 – Le grand tour de Power BI
La Data, levier pour personnaliser sa relation client
La Data, levier pour personnaliser sa relation client
Les secrets d'un bon tableau de bord excel
Les secrets d'un bon tableau de bord excel
10 minutes : Tableaux de bord
10 minutes : Tableaux de bord
Ähnlich wie Hadoop for shanghai dev meetup
Apache hadoop bigdata-in-banking
Apache hadoop bigdata-in-banking
m_hepburn
Hadoop Trends
Hadoop Trends
Hortonworks
Keynote from ApacheCon NA 2011
Keynote from ApacheCon NA 2011
Hortonworks
Cloud computing era
Cloud computing era
TrendProgContest13
Hadoop on Azure, Blue elephants
Hadoop on Azure, Blue elephants
Ovidiu Dimulescu
Firebird meets NoSQL
Firebird meets NoSQL
Mind The Firebird
Create a Smarter Data Lake with HP Haven and Apache Hadoop
Create a Smarter Data Lake with HP Haven and Apache Hadoop
Hortonworks
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Cloudera, Inc.
Introduction To Big Data & Hadoop
Introduction To Big Data & Hadoop
Blackvard
Big Data/Hadoop Infrastructure Considerations
Big Data/Hadoop Infrastructure Considerations
Richard McDougall
Bi with apache hadoop(en)
Bi with apache hadoop(en)
Alexander Alten
201305 hadoop jpl-v3
201305 hadoop jpl-v3
Eric Baldeschwieler
2012 06 hortonworks paris hug
2012 06 hortonworks paris hug
Modern Data Stack France
Analytics on Hadoop
Analytics on Hadoop
EMC
Zh tw cloud computing era
Zh tw cloud computing era
TrendProgContest13
Hadoop Overview
Hadoop Overview
EMC
Hadoop programming
Hadoop programming
Muthusamy Manigandan
Hortonworks: Agile Analytics Applications
Hortonworks: Agile Analytics Applications
russell_jurney
Hadoop - Architectural road map for Hadoop Ecosystem
Hadoop - Architectural road map for Hadoop Ecosystem
nallagangus
Présentation on radoop
Présentation on radoop
siliconsudipt
Ähnlich wie Hadoop for shanghai dev meetup
(20)
Apache hadoop bigdata-in-banking
Apache hadoop bigdata-in-banking
Hadoop Trends
Hadoop Trends
Keynote from ApacheCon NA 2011
Keynote from ApacheCon NA 2011
Cloud computing era
Cloud computing era
Hadoop on Azure, Blue elephants
Hadoop on Azure, Blue elephants
Firebird meets NoSQL
Firebird meets NoSQL
Create a Smarter Data Lake with HP Haven and Apache Hadoop
Create a Smarter Data Lake with HP Haven and Apache Hadoop
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Introduction To Big Data & Hadoop
Introduction To Big Data & Hadoop
Big Data/Hadoop Infrastructure Considerations
Big Data/Hadoop Infrastructure Considerations
Bi with apache hadoop(en)
Bi with apache hadoop(en)
201305 hadoop jpl-v3
201305 hadoop jpl-v3
2012 06 hortonworks paris hug
2012 06 hortonworks paris hug
Analytics on Hadoop
Analytics on Hadoop
Zh tw cloud computing era
Zh tw cloud computing era
Hadoop Overview
Hadoop Overview
Hadoop programming
Hadoop programming
Hortonworks: Agile Analytics Applications
Hortonworks: Agile Analytics Applications
Hadoop - Architectural road map for Hadoop Ecosystem
Hadoop - Architectural road map for Hadoop Ecosystem
Présentation on radoop
Présentation on radoop
Kürzlich hochgeladen
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
sudhanshuwaghmare1
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
Khushali Kathiriya
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
rafiqahmad00786416
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
apidays
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Orbitshub
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
apidays
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
sammart93
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
The Digital Insurer
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
Zilliz
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
Remote DBA Services
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
debabhi2
Architecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
Nanddeep Nachan
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Juan lago vázquez
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
apidays
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Jago de Vreede
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
Overkill Security
Kürzlich hochgeladen
(20)
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
Architecting Cloud Native Applications
Architecting Cloud Native Applications
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
Hadoop for shanghai dev meetup
1.
Hadoop
(Shanghai Developer Meetup – Sept 15, 2011) 余家昌 (Andrew Yu) EMC Greenplum © Copyright 2011 EMC Corporation. All rights reserved. 1
2.
The Elephant Chase ©
Copyright 2011 EMC Corporation. All rights reserved. 2
3.
© Copyright 2011
EMC Corporation. All rights reserved. 3
4.
Yahoo! Hadoop use
cases • Personalized Yahoo! Homepage • Yahoo! Mail anti-spam • Search and Ad pipelines • Ad inventory prediction • Data analytics • etc © Copyright 2011 EMC Corporation. All rights reserved. 4
5.
Enterprise Use Case:
“Big ETL” Challenge: Transform Massive Data Solution: Hadoop/MapReduce as ETL Flows Containing Data Needed for fabric to load to Analytic Database Complex Analysis • Examples: • Components: – Web Traffic Reduction – Hadoop: Massively-parallel ingest, storage and – Network Traffic & Performance Analysis analysis – Location Analytics for People and Goods – MapReduce: Runs multiple cascaded custom analysis / extraction on capture data – Smart Electric Power Grid – Connectors move structured data to Analytics – Genome Analysis DB – Clinical Outcome Research & Analysis • Hadoop’s Roles: • Data Sources: – Capture TBs/day of machine-generated data – Web server & app server logs – Quality: Run data quality tasks in MapReduce – CDR / xDRs – Execute MapReduce flows – Router & Switching Subsystem Logs – Extract/Combine data/metadata – Sensor networks – Move processed data to analytic DB • Limitations & Cautions: – Software development, More parts (Cascading/Flow), Maintainability © Copyright 2011 EMC Corporation. All rights reserved. 5
6.
Enterprise Use Case:
Fraud Detection Challenge: Identify & alert fraudulent Solution: Hadoop/MapReduce to filter activity patterns & correlate communications • Examples: • Components: – ESP’s - Email Fraud – Hadoop: Massively-parallel ingest, – Finance/Banking - Bank Fraud storage and analysis – Advertising - Click Fraud – Mahout: Machine learning tool for building – Telecom – Network fraud fraud algorithms – MapReduce: Rapid analysis & algorithm • Data Sources: deployment – Web & app server logs • Hadoop’s Role(s): – IP/Call Records – Massive ingest of historical/real-time data – Email Traffic – Build/Validate model for fraud detection – Customer Transaction Data manually or using Mahout – Banking/Credit Data – Parallel MapReduce jobs for near real- time fraud detection • Limitations & Cautions: – Software development, Partial Solution (not Real-time, not Interactive) – © Copyright 2011 EMC Corporation. All rights reserved. 6
7.
Enterprise Use Case:
Cluster Analysis Challenge: Grouping a collection of Solution: Process and Refine in data according to common similarities Hadoop and load into Analytical DB • Examples: • Components: – Customer segmentation – Hadoop: Flexible data storage as volume – Financial cost/risk analysis increases and structures vary – Patient-centric healthcare – MapReduce: Cascading allows data – Financial stock classification processing with minimal adjustments – Social network analysis – Optional: Connectors to move results to Analytic DB • Data Sources: • Hadoop’s Role(s): – Health records – Flexible: Allow agile implementation of – Sales data and unit testing of algorithms – Human genome sequences – Large scale analysis in Hadoop creates – Financial trading data more accurate groupings – Facebook/Twitter/LinkedIn – Rapid, parallel processing in MapReduce • Limitations & Cautions: – Software development, Complex Integration with Sources © Copyright 2011 EMC Corporation. All rights reserved. 7
8.
Greenplum HD: Community
Edition Stack 100% APACHE Hive Pig HBase Zookeeper MapReduce Framework (MapRed) Hadoop Distributed File System (HDFS) Currently supported Future releases may include support for Oozie and Mahout © Copyright 2011 EMC Corporation. All rights reserved. 9
9.
Greenplum HD: Enterprise
Edition Stack 100% APACHE Enhanced Monitoring INTERFACE Hive Pig HBase Zookeeper MapReduce Framework (MapRed) Hadoop Distributed File System (HDFS) Currently supported Future releases may include support for Oozie and Mahout © Copyright 2011 EMC Corporation. All rights reserved. 10
10.
Greenplum HD: Enterprise
Edition Enterprise-Ready Hadoop Platform for Unstructured Data • 2 – 5x Faster than Apache Faster Hadoop • High Availability Reliable • Mirroring Easier to • NFS mountable Use • System Management © Copyright 2011 EMC Corporation. All rights reserved. 11
11.
Greenplum Enterprise HD
is Faster than Other Distributions DFSIO Terasort (higher is better) (lower is better) 1000 250 Elapsed time in minutes 900 800 200 700 MB/sec 600 150 500 400 100 300 200 50 100 0 0 Read Write 3.5 TB 10 node cluster, 2x Quad-Core, 24G DRAM, 12 x 1TB SATA Drives @ 7200 rpm, Quad NICs © Copyright 2011 EMC Corporation. All rights reserved. 12
12.
Greenplum Enterprise HD Distributed
Name Node • Fully distributed Hadoop Hadoop Node Node service running on NN NN all Hadoop nodes Hadoop Hadoop Node NN Node NN • Automatic and Hadoop Hadoop transparent failover Node NN Node NN • Persistent metadata Hadoop Node Hadoop Node NN NN • Highly scalable in Hadoop Hadoop Node Node number of files NN NN © Copyright 2011 EMC Corporation. All rights reserved. 13
13.
Greenplum Enterprise HD Job
Tracker High Availability • Assures business continuity • Designed for mission Greenplum Enterprise HD Distribution for Apache Hadoop critical use – Automatic stateful restart – Task Tracker reconnects Enterprise HD MapReduce without task loss Distributed – Persistent completed task Job Tracker HA Name Node state Enterprise HD Lockless Storage Services © Copyright 2011 EMC Corporation. All rights reserved. 14
14.
Greenplum Enterprise HD Snapshots •
Intelligent Snapshots – Automatic data deduplication Hadoop / HBASE NFS APPLICATIONS APPLICATIONS – Block sharing for space READ / WRITE savings Enterprise HD Lockless Storage • Fast and flexible Services – Zero performance loss when REDIRECT ON WRITE FOR SNAPSHOT writing to the original A B C C’ D • Easy to manage – Scheduled or on-demand – Drag and drop recovery Snapshot Snapshot Snapshot 1 2 3 © Copyright 2011 EMC Corporation. All rights reserved. 15
15.
Greenplum Enterprise HD Mirroring
• Business Continuity Production Research – Efficient design – Differential deltas are updated – Data is compressed and Datacenter 1 WAN Datacenter 2 check-summed • Easy to manage – Scheduled or on-demand – Consistent point-in-time Production WAN Cloud © Copyright 2011 EMC Corporation. All rights reserved. 16
16.
Greenplum Enterprise HD
Direct Access Using NFS • Simple application integration Greenplum Enterprise HD Distribution for Apache Hadoop – Leverage NFS for random read/write Enterprise HD MapReduce access • Direct access for Job Tracker HA Distributed Name Node standard Hadoop tools – Command line utilities Enterprise HD Lockless Storage Services – File browsers – Desktop utilities © Copyright 2011 EMC Corporation. All rights reserved. 17
17.
Greenplum Enterprise HD
Simple Management • Intuitive • Insightful • Complete • One node or thousands © Copyright 2011 EMC Corporation. All rights reserved. 18
18.
Greenplum HD: Software
Distributions Features Community Edition Enterprise Edition Apache Compatibility 100% Apache Open Source 100% API Compatible Name Node High Availability Reference Implementation Distributed and High Avaiability Job Tracker HA Reference Implementation HT High Availability Name Node Scalability NN Metadata in Memory Distributed Name Node Premium Support Yes Yes Performance 2 - 5x than Community Edition Snapshots No Yes Mirrors No Yes NFS Mounts No Yes System Management No Yes Available for Ordering May 9th 2011 Q3 Pricing Per Node Pricing Per Node Pricing © Copyright 2011 EMC Corporation. All rights reserved. 19
19.
Greenplum HD on Data
Computing Appliance • Introducing the world’s first: – High-performance – Purpose-built – Data co-processing Hadoop appliance • Combining Greenplum Database and Greenplum Hadoop in one appliance © Copyright 2011 EMC Corporation. All rights reserved. 20
20.
GPDB GPHD
Interoperability GPHD data in/out GPHD in GPDB Query File on HD GPDB External Tables © Copyright 2011 EMC Corporation. All rights reserved. 21
21.
Greenplum Database External Tables
for Hadoop • Bring GPDB relational expressive Example: power to HDFS – HDFS data presented as external tables Select count(*) from – HDFS data supporting full SQL syntax HDFS_data h, GPDB_data g • Have ALL, PART or NONE of your where h.key = g.key; data in HDFS Insert into • Leverage full parallelism of both HDFS_data select * Hadoop and GPDB from GPDB_data; – GPDB can read from/write to HDFS, © Copyright 2011 EMC Corporation. All rights reserved. 22
22.
Greenplum Enterprise HD HDFS
Integration – Parallelized Flow • Reading: – Each GPDB segment reads a portion of the file • Segment i of n reads the i/n-th portion – Access offset from HDFS namenode – Read data directly from HDFS datanode • Writing: – Each GPDB segment writes a file – HDFS balancing distributes the load evenly across datanodes © Copyright 2011 EMC Corporation. All rights reserved. 23
23.
Big Data Analytics
“Stack” Analytic Toolsets (Business Analytics, BI, Statistics, etc.) Greenplum Chorus Enterprise Collaboration Platform for Data Greenplum Database Greenplum HD World’s Most Scalable MPP Database Platform Enterprise Analytics Platform for Unstructured Data Greenplum Data Computing Appliances Purpose-built for Big Data Analytics © Copyright 2011 EMC Corporation. All rights reserved. 24
24.
THANK YOU © Copyright
2011 EMC Corporation. All rights reserved. 25
Jetzt herunterladen