Suche senden
Hochladen
Hadoop for shanghai dev meetup
•
4 gefällt mir
•
1,040 views
Roby Chen
Folgen
Technologie
Business
Melden
Teilen
Melden
Teilen
1 von 24
Jetzt herunterladen
Downloaden Sie, um offline zu lesen
Empfohlen
Drill njhug -19 feb2013
Drill njhug -19 feb2013
MapR Technologies
Building Big Data Applications
Building Big Data Applications
Richard McDougall
Self-Service Access and Exploration of Big Data
Self-Service Access and Exploration of Big Data
Inside Analysis
Hadoop - Now, Next and Beyond
Hadoop - Now, Next and Beyond
Teradata Aster
Hadoop as Data Refinery - Steve Loughran
Hadoop as Data Refinery - Steve Loughran
JAX London
Hadoop as data refinery
Hadoop as data refinery
Steve Loughran
Kognitio overview jan 2013
Kognitio overview jan 2013
Michael Hiskey
HugeTable:Application-Oriented Structure Data Storage System
HugeTable:Application-Oriented Structure Data Storage System
qlw5
Empfohlen
Drill njhug -19 feb2013
Drill njhug -19 feb2013
MapR Technologies
Building Big Data Applications
Building Big Data Applications
Richard McDougall
Self-Service Access and Exploration of Big Data
Self-Service Access and Exploration of Big Data
Inside Analysis
Hadoop - Now, Next and Beyond
Hadoop - Now, Next and Beyond
Teradata Aster
Hadoop as Data Refinery - Steve Loughran
Hadoop as Data Refinery - Steve Loughran
JAX London
Hadoop as data refinery
Hadoop as data refinery
Steve Loughran
Kognitio overview jan 2013
Kognitio overview jan 2013
Michael Hiskey
HugeTable:Application-Oriented Structure Data Storage System
HugeTable:Application-Oriented Structure Data Storage System
qlw5
Hadoop Data Reservoir Webinar
Hadoop Data Reservoir Webinar
Platfora
Hadoop World 2011: Building Scalable Data Platforms ; Hadoop & Netezza Deploy...
Hadoop World 2011: Building Scalable Data Platforms ; Hadoop & Netezza Deploy...
Krishnan Parasuraman
Integrating Hadoop Into the Enterprise – Hadoop Summit 2012
Integrating Hadoop Into the Enterprise – Hadoop Summit 2012
Jonathan Seidman
Using hadoop to expand data warehousing
Using hadoop to expand data warehousing
DataWorks Summit
Agile analytics applications on hadoop
Agile analytics applications on hadoop
Hortonworks
hadoop @ Ibmbigdata
hadoop @ Ibmbigdata
Eric Baldeschwieler
Hadoop - Where did it come from and what's next? (Pasadena Sept 2014)
Hadoop - Where did it come from and what's next? (Pasadena Sept 2014)
Eric Baldeschwieler
Hw09 Data Processing In The Enterprise
Hw09 Data Processing In The Enterprise
Cloudera, Inc.
Introduction to Hortonworks Data Platform for Windows
Introduction to Hortonworks Data Platform for Windows
Hortonworks
Linking Data and Actions on the Web
Linking Data and Actions on the Web
Stuart Charlton
I'll See You On the Write Side of the Web
I'll See You On the Write Side of the Web
Stuart Charlton
Greenplum hadoop
Greenplum hadoop
Chiou-Nan Chen
Introduction to Microsoft HDInsight and BI Tools
Introduction to Microsoft HDInsight and BI Tools
DataWorks Summit
Big data, map reduce and beyond
Big data, map reduce and beyond
datasalt
Emergent Distributed Data Storage
Emergent Distributed Data Storage
hybrid cloud
Tajo_Meetup_20141120
Tajo_Meetup_20141120
Hyoungjun Kim
Hadoop summit EU - Crowd Sourcing Reflected Intelligence
Hadoop summit EU - Crowd Sourcing Reflected Intelligence
Ted Dunning
Demonstrating the Future of Data Science
Demonstrating the Future of Data Science
greenplum
Hortonworks and Red Hat Webinar - Part 2
Hortonworks and Red Hat Webinar - Part 2
Hortonworks
Complex Er[jl]ang Processing with StreamBase
Complex Er[jl]ang Processing with StreamBase
darach
WANA GROUP agence full services de talents digitaux + de 180 collaborateurs
WANA GROUP agence full services de talents digitaux + de 180 collaborateurs
Aurélien Malo
Conférence "le big data en entreprise" de René Lefébure lors de l'évènement ...
Conférence "le big data en entreprise" de René Lefébure lors de l'évènement ...
WANA GROUP
Weitere ähnliche Inhalte
Was ist angesagt?
Hadoop Data Reservoir Webinar
Hadoop Data Reservoir Webinar
Platfora
Hadoop World 2011: Building Scalable Data Platforms ; Hadoop & Netezza Deploy...
Hadoop World 2011: Building Scalable Data Platforms ; Hadoop & Netezza Deploy...
Krishnan Parasuraman
Integrating Hadoop Into the Enterprise – Hadoop Summit 2012
Integrating Hadoop Into the Enterprise – Hadoop Summit 2012
Jonathan Seidman
Using hadoop to expand data warehousing
Using hadoop to expand data warehousing
DataWorks Summit
Agile analytics applications on hadoop
Agile analytics applications on hadoop
Hortonworks
hadoop @ Ibmbigdata
hadoop @ Ibmbigdata
Eric Baldeschwieler
Hadoop - Where did it come from and what's next? (Pasadena Sept 2014)
Hadoop - Where did it come from and what's next? (Pasadena Sept 2014)
Eric Baldeschwieler
Hw09 Data Processing In The Enterprise
Hw09 Data Processing In The Enterprise
Cloudera, Inc.
Introduction to Hortonworks Data Platform for Windows
Introduction to Hortonworks Data Platform for Windows
Hortonworks
Linking Data and Actions on the Web
Linking Data and Actions on the Web
Stuart Charlton
I'll See You On the Write Side of the Web
I'll See You On the Write Side of the Web
Stuart Charlton
Greenplum hadoop
Greenplum hadoop
Chiou-Nan Chen
Introduction to Microsoft HDInsight and BI Tools
Introduction to Microsoft HDInsight and BI Tools
DataWorks Summit
Big data, map reduce and beyond
Big data, map reduce and beyond
datasalt
Emergent Distributed Data Storage
Emergent Distributed Data Storage
hybrid cloud
Tajo_Meetup_20141120
Tajo_Meetup_20141120
Hyoungjun Kim
Hadoop summit EU - Crowd Sourcing Reflected Intelligence
Hadoop summit EU - Crowd Sourcing Reflected Intelligence
Ted Dunning
Demonstrating the Future of Data Science
Demonstrating the Future of Data Science
greenplum
Hortonworks and Red Hat Webinar - Part 2
Hortonworks and Red Hat Webinar - Part 2
Hortonworks
Complex Er[jl]ang Processing with StreamBase
Complex Er[jl]ang Processing with StreamBase
darach
Was ist angesagt?
(20)
Hadoop Data Reservoir Webinar
Hadoop Data Reservoir Webinar
Hadoop World 2011: Building Scalable Data Platforms ; Hadoop & Netezza Deploy...
Hadoop World 2011: Building Scalable Data Platforms ; Hadoop & Netezza Deploy...
Integrating Hadoop Into the Enterprise – Hadoop Summit 2012
Integrating Hadoop Into the Enterprise – Hadoop Summit 2012
Using hadoop to expand data warehousing
Using hadoop to expand data warehousing
Agile analytics applications on hadoop
Agile analytics applications on hadoop
hadoop @ Ibmbigdata
hadoop @ Ibmbigdata
Hadoop - Where did it come from and what's next? (Pasadena Sept 2014)
Hadoop - Where did it come from and what's next? (Pasadena Sept 2014)
Hw09 Data Processing In The Enterprise
Hw09 Data Processing In The Enterprise
Introduction to Hortonworks Data Platform for Windows
Introduction to Hortonworks Data Platform for Windows
Linking Data and Actions on the Web
Linking Data and Actions on the Web
I'll See You On the Write Side of the Web
I'll See You On the Write Side of the Web
Greenplum hadoop
Greenplum hadoop
Introduction to Microsoft HDInsight and BI Tools
Introduction to Microsoft HDInsight and BI Tools
Big data, map reduce and beyond
Big data, map reduce and beyond
Emergent Distributed Data Storage
Emergent Distributed Data Storage
Tajo_Meetup_20141120
Tajo_Meetup_20141120
Hadoop summit EU - Crowd Sourcing Reflected Intelligence
Hadoop summit EU - Crowd Sourcing Reflected Intelligence
Demonstrating the Future of Data Science
Demonstrating the Future of Data Science
Hortonworks and Red Hat Webinar - Part 2
Hortonworks and Red Hat Webinar - Part 2
Complex Er[jl]ang Processing with StreamBase
Complex Er[jl]ang Processing with StreamBase
Andere mochten auch
WANA GROUP agence full services de talents digitaux + de 180 collaborateurs
WANA GROUP agence full services de talents digitaux + de 180 collaborateurs
Aurélien Malo
Conférence "le big data en entreprise" de René Lefébure lors de l'évènement ...
Conférence "le big data en entreprise" de René Lefébure lors de l'évènement ...
WANA GROUP
JSS2014 – Le grand tour de Power BI
JSS2014 – Le grand tour de Power BI
GUSS
La Data, levier pour personnaliser sa relation client
La Data, levier pour personnaliser sa relation client
Hassan Lâasri
Les secrets d'un bon tableau de bord excel
Les secrets d'un bon tableau de bord excel
Sophie Marchand, M.Sc., CPA, CGA, MVP
10 minutes : Tableaux de bord
10 minutes : Tableaux de bord
Converteo
Andere mochten auch
(6)
WANA GROUP agence full services de talents digitaux + de 180 collaborateurs
WANA GROUP agence full services de talents digitaux + de 180 collaborateurs
Conférence "le big data en entreprise" de René Lefébure lors de l'évènement ...
Conférence "le big data en entreprise" de René Lefébure lors de l'évènement ...
JSS2014 – Le grand tour de Power BI
JSS2014 – Le grand tour de Power BI
La Data, levier pour personnaliser sa relation client
La Data, levier pour personnaliser sa relation client
Les secrets d'un bon tableau de bord excel
Les secrets d'un bon tableau de bord excel
10 minutes : Tableaux de bord
10 minutes : Tableaux de bord
Ähnlich wie Hadoop for shanghai dev meetup
Apache hadoop bigdata-in-banking
Apache hadoop bigdata-in-banking
m_hepburn
Hadoop Trends
Hadoop Trends
Hortonworks
Keynote from ApacheCon NA 2011
Keynote from ApacheCon NA 2011
Hortonworks
Cloud computing era
Cloud computing era
TrendProgContest13
Hadoop on Azure, Blue elephants
Hadoop on Azure, Blue elephants
Ovidiu Dimulescu
Firebird meets NoSQL
Firebird meets NoSQL
Mind The Firebird
Create a Smarter Data Lake with HP Haven and Apache Hadoop
Create a Smarter Data Lake with HP Haven and Apache Hadoop
Hortonworks
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Cloudera, Inc.
Introduction To Big Data & Hadoop
Introduction To Big Data & Hadoop
Blackvard
Big Data/Hadoop Infrastructure Considerations
Big Data/Hadoop Infrastructure Considerations
Richard McDougall
Bi with apache hadoop(en)
Bi with apache hadoop(en)
Alexander Alten
201305 hadoop jpl-v3
201305 hadoop jpl-v3
Eric Baldeschwieler
2012 06 hortonworks paris hug
2012 06 hortonworks paris hug
Modern Data Stack France
Analytics on Hadoop
Analytics on Hadoop
EMC
Zh tw cloud computing era
Zh tw cloud computing era
TrendProgContest13
Hadoop Overview
Hadoop Overview
EMC
Hadoop programming
Hadoop programming
Muthusamy Manigandan
Hortonworks: Agile Analytics Applications
Hortonworks: Agile Analytics Applications
russell_jurney
Hadoop - Architectural road map for Hadoop Ecosystem
Hadoop - Architectural road map for Hadoop Ecosystem
nallagangus
Présentation on radoop
Présentation on radoop
siliconsudipt
Ähnlich wie Hadoop for shanghai dev meetup
(20)
Apache hadoop bigdata-in-banking
Apache hadoop bigdata-in-banking
Hadoop Trends
Hadoop Trends
Keynote from ApacheCon NA 2011
Keynote from ApacheCon NA 2011
Cloud computing era
Cloud computing era
Hadoop on Azure, Blue elephants
Hadoop on Azure, Blue elephants
Firebird meets NoSQL
Firebird meets NoSQL
Create a Smarter Data Lake with HP Haven and Apache Hadoop
Create a Smarter Data Lake with HP Haven and Apache Hadoop
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Introduction To Big Data & Hadoop
Introduction To Big Data & Hadoop
Big Data/Hadoop Infrastructure Considerations
Big Data/Hadoop Infrastructure Considerations
Bi with apache hadoop(en)
Bi with apache hadoop(en)
201305 hadoop jpl-v3
201305 hadoop jpl-v3
2012 06 hortonworks paris hug
2012 06 hortonworks paris hug
Analytics on Hadoop
Analytics on Hadoop
Zh tw cloud computing era
Zh tw cloud computing era
Hadoop Overview
Hadoop Overview
Hadoop programming
Hadoop programming
Hortonworks: Agile Analytics Applications
Hortonworks: Agile Analytics Applications
Hadoop - Architectural road map for Hadoop Ecosystem
Hadoop - Architectural road map for Hadoop Ecosystem
Présentation on radoop
Présentation on radoop
Kürzlich hochgeladen
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
Zilliz
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
presentation ICT roal in 21st century education
presentation ICT roal in 21st century education
jfdjdjcjdnsjd
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
Product Anonymous
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
apidays
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
The Digital Insurer
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
debabhi2
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
Rustici Software
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
Anna Loughnan Colquhoun
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
Martijn de Jong
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
Andrey Devyatkin
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Jeffrey Haguewood
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Miguel Araújo
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
The Digital Insurer
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
lior mazor
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
apidays
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
Overkill Security
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Edi Saputra
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
sammart93
Kürzlich hochgeladen
(20)
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
presentation ICT roal in 21st century education
presentation ICT roal in 21st century education
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Hadoop for shanghai dev meetup
1.
Hadoop
(Shanghai Developer Meetup – Sept 15, 2011) 余家昌 (Andrew Yu) EMC Greenplum © Copyright 2011 EMC Corporation. All rights reserved. 1
2.
The Elephant Chase ©
Copyright 2011 EMC Corporation. All rights reserved. 2
3.
© Copyright 2011
EMC Corporation. All rights reserved. 3
4.
Yahoo! Hadoop use
cases • Personalized Yahoo! Homepage • Yahoo! Mail anti-spam • Search and Ad pipelines • Ad inventory prediction • Data analytics • etc © Copyright 2011 EMC Corporation. All rights reserved. 4
5.
Enterprise Use Case:
“Big ETL” Challenge: Transform Massive Data Solution: Hadoop/MapReduce as ETL Flows Containing Data Needed for fabric to load to Analytic Database Complex Analysis • Examples: • Components: – Web Traffic Reduction – Hadoop: Massively-parallel ingest, storage and – Network Traffic & Performance Analysis analysis – Location Analytics for People and Goods – MapReduce: Runs multiple cascaded custom analysis / extraction on capture data – Smart Electric Power Grid – Connectors move structured data to Analytics – Genome Analysis DB – Clinical Outcome Research & Analysis • Hadoop’s Roles: • Data Sources: – Capture TBs/day of machine-generated data – Web server & app server logs – Quality: Run data quality tasks in MapReduce – CDR / xDRs – Execute MapReduce flows – Router & Switching Subsystem Logs – Extract/Combine data/metadata – Sensor networks – Move processed data to analytic DB • Limitations & Cautions: – Software development, More parts (Cascading/Flow), Maintainability © Copyright 2011 EMC Corporation. All rights reserved. 5
6.
Enterprise Use Case:
Fraud Detection Challenge: Identify & alert fraudulent Solution: Hadoop/MapReduce to filter activity patterns & correlate communications • Examples: • Components: – ESP’s - Email Fraud – Hadoop: Massively-parallel ingest, – Finance/Banking - Bank Fraud storage and analysis – Advertising - Click Fraud – Mahout: Machine learning tool for building – Telecom – Network fraud fraud algorithms – MapReduce: Rapid analysis & algorithm • Data Sources: deployment – Web & app server logs • Hadoop’s Role(s): – IP/Call Records – Massive ingest of historical/real-time data – Email Traffic – Build/Validate model for fraud detection – Customer Transaction Data manually or using Mahout – Banking/Credit Data – Parallel MapReduce jobs for near real- time fraud detection • Limitations & Cautions: – Software development, Partial Solution (not Real-time, not Interactive) – © Copyright 2011 EMC Corporation. All rights reserved. 6
7.
Enterprise Use Case:
Cluster Analysis Challenge: Grouping a collection of Solution: Process and Refine in data according to common similarities Hadoop and load into Analytical DB • Examples: • Components: – Customer segmentation – Hadoop: Flexible data storage as volume – Financial cost/risk analysis increases and structures vary – Patient-centric healthcare – MapReduce: Cascading allows data – Financial stock classification processing with minimal adjustments – Social network analysis – Optional: Connectors to move results to Analytic DB • Data Sources: • Hadoop’s Role(s): – Health records – Flexible: Allow agile implementation of – Sales data and unit testing of algorithms – Human genome sequences – Large scale analysis in Hadoop creates – Financial trading data more accurate groupings – Facebook/Twitter/LinkedIn – Rapid, parallel processing in MapReduce • Limitations & Cautions: – Software development, Complex Integration with Sources © Copyright 2011 EMC Corporation. All rights reserved. 7
8.
Greenplum HD: Community
Edition Stack 100% APACHE Hive Pig HBase Zookeeper MapReduce Framework (MapRed) Hadoop Distributed File System (HDFS) Currently supported Future releases may include support for Oozie and Mahout © Copyright 2011 EMC Corporation. All rights reserved. 9
9.
Greenplum HD: Enterprise
Edition Stack 100% APACHE Enhanced Monitoring INTERFACE Hive Pig HBase Zookeeper MapReduce Framework (MapRed) Hadoop Distributed File System (HDFS) Currently supported Future releases may include support for Oozie and Mahout © Copyright 2011 EMC Corporation. All rights reserved. 10
10.
Greenplum HD: Enterprise
Edition Enterprise-Ready Hadoop Platform for Unstructured Data • 2 – 5x Faster than Apache Faster Hadoop • High Availability Reliable • Mirroring Easier to • NFS mountable Use • System Management © Copyright 2011 EMC Corporation. All rights reserved. 11
11.
Greenplum Enterprise HD
is Faster than Other Distributions DFSIO Terasort (higher is better) (lower is better) 1000 250 Elapsed time in minutes 900 800 200 700 MB/sec 600 150 500 400 100 300 200 50 100 0 0 Read Write 3.5 TB 10 node cluster, 2x Quad-Core, 24G DRAM, 12 x 1TB SATA Drives @ 7200 rpm, Quad NICs © Copyright 2011 EMC Corporation. All rights reserved. 12
12.
Greenplum Enterprise HD Distributed
Name Node • Fully distributed Hadoop Hadoop Node Node service running on NN NN all Hadoop nodes Hadoop Hadoop Node NN Node NN • Automatic and Hadoop Hadoop transparent failover Node NN Node NN • Persistent metadata Hadoop Node Hadoop Node NN NN • Highly scalable in Hadoop Hadoop Node Node number of files NN NN © Copyright 2011 EMC Corporation. All rights reserved. 13
13.
Greenplum Enterprise HD Job
Tracker High Availability • Assures business continuity • Designed for mission Greenplum Enterprise HD Distribution for Apache Hadoop critical use – Automatic stateful restart – Task Tracker reconnects Enterprise HD MapReduce without task loss Distributed – Persistent completed task Job Tracker HA Name Node state Enterprise HD Lockless Storage Services © Copyright 2011 EMC Corporation. All rights reserved. 14
14.
Greenplum Enterprise HD Snapshots •
Intelligent Snapshots – Automatic data deduplication Hadoop / HBASE NFS APPLICATIONS APPLICATIONS – Block sharing for space READ / WRITE savings Enterprise HD Lockless Storage • Fast and flexible Services – Zero performance loss when REDIRECT ON WRITE FOR SNAPSHOT writing to the original A B C C’ D • Easy to manage – Scheduled or on-demand – Drag and drop recovery Snapshot Snapshot Snapshot 1 2 3 © Copyright 2011 EMC Corporation. All rights reserved. 15
15.
Greenplum Enterprise HD Mirroring
• Business Continuity Production Research – Efficient design – Differential deltas are updated – Data is compressed and Datacenter 1 WAN Datacenter 2 check-summed • Easy to manage – Scheduled or on-demand – Consistent point-in-time Production WAN Cloud © Copyright 2011 EMC Corporation. All rights reserved. 16
16.
Greenplum Enterprise HD
Direct Access Using NFS • Simple application integration Greenplum Enterprise HD Distribution for Apache Hadoop – Leverage NFS for random read/write Enterprise HD MapReduce access • Direct access for Job Tracker HA Distributed Name Node standard Hadoop tools – Command line utilities Enterprise HD Lockless Storage Services – File browsers – Desktop utilities © Copyright 2011 EMC Corporation. All rights reserved. 17
17.
Greenplum Enterprise HD
Simple Management • Intuitive • Insightful • Complete • One node or thousands © Copyright 2011 EMC Corporation. All rights reserved. 18
18.
Greenplum HD: Software
Distributions Features Community Edition Enterprise Edition Apache Compatibility 100% Apache Open Source 100% API Compatible Name Node High Availability Reference Implementation Distributed and High Avaiability Job Tracker HA Reference Implementation HT High Availability Name Node Scalability NN Metadata in Memory Distributed Name Node Premium Support Yes Yes Performance 2 - 5x than Community Edition Snapshots No Yes Mirrors No Yes NFS Mounts No Yes System Management No Yes Available for Ordering May 9th 2011 Q3 Pricing Per Node Pricing Per Node Pricing © Copyright 2011 EMC Corporation. All rights reserved. 19
19.
Greenplum HD on Data
Computing Appliance • Introducing the world’s first: – High-performance – Purpose-built – Data co-processing Hadoop appliance • Combining Greenplum Database and Greenplum Hadoop in one appliance © Copyright 2011 EMC Corporation. All rights reserved. 20
20.
GPDB GPHD
Interoperability GPHD data in/out GPHD in GPDB Query File on HD GPDB External Tables © Copyright 2011 EMC Corporation. All rights reserved. 21
21.
Greenplum Database External Tables
for Hadoop • Bring GPDB relational expressive Example: power to HDFS – HDFS data presented as external tables Select count(*) from – HDFS data supporting full SQL syntax HDFS_data h, GPDB_data g • Have ALL, PART or NONE of your where h.key = g.key; data in HDFS Insert into • Leverage full parallelism of both HDFS_data select * Hadoop and GPDB from GPDB_data; – GPDB can read from/write to HDFS, © Copyright 2011 EMC Corporation. All rights reserved. 22
22.
Greenplum Enterprise HD HDFS
Integration – Parallelized Flow • Reading: – Each GPDB segment reads a portion of the file • Segment i of n reads the i/n-th portion – Access offset from HDFS namenode – Read data directly from HDFS datanode • Writing: – Each GPDB segment writes a file – HDFS balancing distributes the load evenly across datanodes © Copyright 2011 EMC Corporation. All rights reserved. 23
23.
Big Data Analytics
“Stack” Analytic Toolsets (Business Analytics, BI, Statistics, etc.) Greenplum Chorus Enterprise Collaboration Platform for Data Greenplum Database Greenplum HD World’s Most Scalable MPP Database Platform Enterprise Analytics Platform for Unstructured Data Greenplum Data Computing Appliances Purpose-built for Big Data Analytics © Copyright 2011 EMC Corporation. All rights reserved. 24
24.
THANK YOU © Copyright
2011 EMC Corporation. All rights reserved. 25
Jetzt herunterladen