Suche senden
Hochladen
Greenplum Database on HDFS
•
14 gefällt mir
•
3,514 views
DataWorks Summit
Folgen
Technologie
Business
Melden
Teilen
Melden
Teilen
1 von 16
Empfohlen
Hadoop & Greenplum: Why Do Such a Thing?
Hadoop & Greenplum: Why Do Such a Thing?
Ed Kohlwey
Greenplum Database Overview
Greenplum Database Overview
EMC
White Paper: Backup and Recovery of the EMC Greenplum Data Computing Applian...
White Paper: Backup and Recovery of the EMC Greenplum Data Computing Applian...
EMC
Demonstrating the Future of Data Science
Demonstrating the Future of Data Science
greenplum
Greenplum Architecture
Greenplum Architecture
Alexey Grishchenko
Greenplum hadoop
Greenplum hadoop
Chiou-Nan Chen
Greenplum Database Open Source December 2015
Greenplum Database Open Source December 2015
PivotalOpenSourceHub
Introduction to Greenplum
Introduction to Greenplum
Dave Cramer
Empfohlen
Hadoop & Greenplum: Why Do Such a Thing?
Hadoop & Greenplum: Why Do Such a Thing?
Ed Kohlwey
Greenplum Database Overview
Greenplum Database Overview
EMC
White Paper: Backup and Recovery of the EMC Greenplum Data Computing Applian...
White Paper: Backup and Recovery of the EMC Greenplum Data Computing Applian...
EMC
Demonstrating the Future of Data Science
Demonstrating the Future of Data Science
greenplum
Greenplum Architecture
Greenplum Architecture
Alexey Grishchenko
Greenplum hadoop
Greenplum hadoop
Chiou-Nan Chen
Greenplum Database Open Source December 2015
Greenplum Database Open Source December 2015
PivotalOpenSourceHub
Introduction to Greenplum
Introduction to Greenplum
Dave Cramer
Greenplum: Driving the future of Data Warehousing and Analytics
Greenplum: Driving the future of Data Warehousing and Analytics
eaiti
Introduction sur Tez par Olivier RENAULT de HortonWorks Meetup du 25/11/2014
Introduction sur Tez par Olivier RENAULT de HortonWorks Meetup du 25/11/2014
Modern Data Stack France
An overview of reference architectures for Postgres
An overview of reference architectures for Postgres
EDB
Netezza vs Teradata vs Exadata
Netezza vs Teradata vs Exadata
Asis Mohanty
A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...
A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...
Srivatsan Ramanujam
The IBM Netezza Data Warehouse Appliance
The IBM Netezza Data Warehouse Appliance
IBM Sverige
Netezza Deep Dives
Netezza Deep Dives
Rush Shah
Teradata vs-exadata
Teradata vs-exadata
Louis liu
IBM Pure Data System for Analytics (Netezza)
IBM Pure Data System for Analytics (Netezza)
Girish Srivastava
Optimizing MapReduce Job performance
Optimizing MapReduce Job performance
DataWorks Summit
Netezza vs teradata
Netezza vs teradata
Asis Mohanty
Introduction to map reduce
Introduction to map reduce
TrendProgContest13
The Dell EMC PowerMax 8000 outperformed another vendor's array on an OLTP-lik...
The Dell EMC PowerMax 8000 outperformed another vendor's array on an OLTP-lik...
Principled Technologies
The IBM Netezza datawarehouse appliance
The IBM Netezza datawarehouse appliance
IBM Danmark
Oracle Database 12c para la comunidad GeneXus - Engineered for clouds
Oracle Database 12c para la comunidad GeneXus - Engineered for clouds
GeneXus
Understand the Query Plan to Optimize Performance with EXPLAIN and EXPLAIN AN...
Understand the Query Plan to Optimize Performance with EXPLAIN and EXPLAIN AN...
EDB
Store data more efficiently and increase I/O performance with lower latency w...
Store data more efficiently and increase I/O performance with lower latency w...
Principled Technologies
Ibm pure data system for analytics n200x
Ibm pure data system for analytics n200x
IBM Sverige
Hadoop Summit 2012 | Optimizing MapReduce Job Performance
Hadoop Summit 2012 | Optimizing MapReduce Job Performance
Cloudera, Inc.
Apache Hadoop YARN 3.x in Alibaba
Apache Hadoop YARN 3.x in Alibaba
DataWorks Summit
Greenplum Analytics Workbench - What Can a Private Hadoop Cloud Do For You?
Greenplum Analytics Workbench - What Can a Private Hadoop Cloud Do For You?
EMC
Extend starfish to Support the Growing Hadoop Ecosystem
Extend starfish to Support the Growing Hadoop Ecosystem
Fei Dong
Weitere ähnliche Inhalte
Was ist angesagt?
Greenplum: Driving the future of Data Warehousing and Analytics
Greenplum: Driving the future of Data Warehousing and Analytics
eaiti
Introduction sur Tez par Olivier RENAULT de HortonWorks Meetup du 25/11/2014
Introduction sur Tez par Olivier RENAULT de HortonWorks Meetup du 25/11/2014
Modern Data Stack France
An overview of reference architectures for Postgres
An overview of reference architectures for Postgres
EDB
Netezza vs Teradata vs Exadata
Netezza vs Teradata vs Exadata
Asis Mohanty
A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...
A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...
Srivatsan Ramanujam
The IBM Netezza Data Warehouse Appliance
The IBM Netezza Data Warehouse Appliance
IBM Sverige
Netezza Deep Dives
Netezza Deep Dives
Rush Shah
Teradata vs-exadata
Teradata vs-exadata
Louis liu
IBM Pure Data System for Analytics (Netezza)
IBM Pure Data System for Analytics (Netezza)
Girish Srivastava
Optimizing MapReduce Job performance
Optimizing MapReduce Job performance
DataWorks Summit
Netezza vs teradata
Netezza vs teradata
Asis Mohanty
Introduction to map reduce
Introduction to map reduce
TrendProgContest13
The Dell EMC PowerMax 8000 outperformed another vendor's array on an OLTP-lik...
The Dell EMC PowerMax 8000 outperformed another vendor's array on an OLTP-lik...
Principled Technologies
The IBM Netezza datawarehouse appliance
The IBM Netezza datawarehouse appliance
IBM Danmark
Oracle Database 12c para la comunidad GeneXus - Engineered for clouds
Oracle Database 12c para la comunidad GeneXus - Engineered for clouds
GeneXus
Understand the Query Plan to Optimize Performance with EXPLAIN and EXPLAIN AN...
Understand the Query Plan to Optimize Performance with EXPLAIN and EXPLAIN AN...
EDB
Store data more efficiently and increase I/O performance with lower latency w...
Store data more efficiently and increase I/O performance with lower latency w...
Principled Technologies
Ibm pure data system for analytics n200x
Ibm pure data system for analytics n200x
IBM Sverige
Hadoop Summit 2012 | Optimizing MapReduce Job Performance
Hadoop Summit 2012 | Optimizing MapReduce Job Performance
Cloudera, Inc.
Apache Hadoop YARN 3.x in Alibaba
Apache Hadoop YARN 3.x in Alibaba
DataWorks Summit
Was ist angesagt?
(20)
Greenplum: Driving the future of Data Warehousing and Analytics
Greenplum: Driving the future of Data Warehousing and Analytics
Introduction sur Tez par Olivier RENAULT de HortonWorks Meetup du 25/11/2014
Introduction sur Tez par Olivier RENAULT de HortonWorks Meetup du 25/11/2014
An overview of reference architectures for Postgres
An overview of reference architectures for Postgres
Netezza vs Teradata vs Exadata
Netezza vs Teradata vs Exadata
A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...
A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...
The IBM Netezza Data Warehouse Appliance
The IBM Netezza Data Warehouse Appliance
Netezza Deep Dives
Netezza Deep Dives
Teradata vs-exadata
Teradata vs-exadata
IBM Pure Data System for Analytics (Netezza)
IBM Pure Data System for Analytics (Netezza)
Optimizing MapReduce Job performance
Optimizing MapReduce Job performance
Netezza vs teradata
Netezza vs teradata
Introduction to map reduce
Introduction to map reduce
The Dell EMC PowerMax 8000 outperformed another vendor's array on an OLTP-lik...
The Dell EMC PowerMax 8000 outperformed another vendor's array on an OLTP-lik...
The IBM Netezza datawarehouse appliance
The IBM Netezza datawarehouse appliance
Oracle Database 12c para la comunidad GeneXus - Engineered for clouds
Oracle Database 12c para la comunidad GeneXus - Engineered for clouds
Understand the Query Plan to Optimize Performance with EXPLAIN and EXPLAIN AN...
Understand the Query Plan to Optimize Performance with EXPLAIN and EXPLAIN AN...
Store data more efficiently and increase I/O performance with lower latency w...
Store data more efficiently and increase I/O performance with lower latency w...
Ibm pure data system for analytics n200x
Ibm pure data system for analytics n200x
Hadoop Summit 2012 | Optimizing MapReduce Job Performance
Hadoop Summit 2012 | Optimizing MapReduce Job Performance
Apache Hadoop YARN 3.x in Alibaba
Apache Hadoop YARN 3.x in Alibaba
Ähnlich wie Greenplum Database on HDFS
Greenplum Analytics Workbench - What Can a Private Hadoop Cloud Do For You?
Greenplum Analytics Workbench - What Can a Private Hadoop Cloud Do For You?
EMC
Extend starfish to Support the Growing Hadoop Ecosystem
Extend starfish to Support the Growing Hadoop Ecosystem
Fei Dong
IBM Spark Meetup - RDD & Spark Basics
IBM Spark Meetup - RDD & Spark Basics
Satya Narayan
Ria2010 workshop dev mobile
Ria2010 workshop dev mobile
Michael Chaize
Hadoop Analytics on Isilon Deep Dive
Hadoop Analytics on Isilon Deep Dive
ClaudioFahey1
Lego Cloud SAP Virtualization Week 2012
Lego Cloud SAP Virtualization Week 2012
Benoit Hudzia
50a volumes
50a volumes
mapr-academy
Apache Spark Introduction and Resilient Distributed Dataset basics and deep dive
Apache Spark Introduction and Resilient Distributed Dataset basics and deep dive
Sachin Aggarwal
Hadoop 101
Hadoop 101
EMC
Large Scale Performance Monitoring for ElasticSearch, HBase, Solr, SenseiDB, ...
Large Scale Performance Monitoring for ElasticSearch, HBase, Solr, SenseiDB, ...
Sematext Group, Inc.
SAP Virtualization Week 2012 - The Lego Cloud
SAP Virtualization Week 2012 - The Lego Cloud
aidanshribman
Paremus Cloud and OSGi Beyond the VM - OSGi Cloud Workshop March 2012
Paremus Cloud and OSGi Beyond the VM - OSGi Cloud Workshop March 2012
mfrancis
Compile ahead of time. It's fine?
Compile ahead of time. It's fine?
Dmitry Chuyko
The Forefront of the Development for NVDIMM on Linux Kernel
The Forefront of the Development for NVDIMM on Linux Kernel
Yasunori Goto
Slides: Introducing the new ClusterControl 1.2.10 for MySQL, MongoDB and Post...
Slides: Introducing the new ClusterControl 1.2.10 for MySQL, MongoDB and Post...
Severalnines
Hadoop for carrier
Hadoop for carrier
Flytxt
Integrating Lucene into a Transactional XML Database
Integrating Lucene into a Transactional XML Database
lucenerevolution
Improving Hadoop Resiliency and Operational Efficiency with EMC Isilon
Improving Hadoop Resiliency and Operational Efficiency with EMC Isilon
DataWorks Summit/Hadoop Summit
Architecture_Masking_Delphix.pptx
Architecture_Masking_Delphix.pptx
shaikshazil1
Zend Products and PHP for IBMi
Zend Products and PHP for IBMi
Shlomo Vanunu
Ähnlich wie Greenplum Database on HDFS
(20)
Greenplum Analytics Workbench - What Can a Private Hadoop Cloud Do For You?
Greenplum Analytics Workbench - What Can a Private Hadoop Cloud Do For You?
Extend starfish to Support the Growing Hadoop Ecosystem
Extend starfish to Support the Growing Hadoop Ecosystem
IBM Spark Meetup - RDD & Spark Basics
IBM Spark Meetup - RDD & Spark Basics
Ria2010 workshop dev mobile
Ria2010 workshop dev mobile
Hadoop Analytics on Isilon Deep Dive
Hadoop Analytics on Isilon Deep Dive
Lego Cloud SAP Virtualization Week 2012
Lego Cloud SAP Virtualization Week 2012
50a volumes
50a volumes
Apache Spark Introduction and Resilient Distributed Dataset basics and deep dive
Apache Spark Introduction and Resilient Distributed Dataset basics and deep dive
Hadoop 101
Hadoop 101
Large Scale Performance Monitoring for ElasticSearch, HBase, Solr, SenseiDB, ...
Large Scale Performance Monitoring for ElasticSearch, HBase, Solr, SenseiDB, ...
SAP Virtualization Week 2012 - The Lego Cloud
SAP Virtualization Week 2012 - The Lego Cloud
Paremus Cloud and OSGi Beyond the VM - OSGi Cloud Workshop March 2012
Paremus Cloud and OSGi Beyond the VM - OSGi Cloud Workshop March 2012
Compile ahead of time. It's fine?
Compile ahead of time. It's fine?
The Forefront of the Development for NVDIMM on Linux Kernel
The Forefront of the Development for NVDIMM on Linux Kernel
Slides: Introducing the new ClusterControl 1.2.10 for MySQL, MongoDB and Post...
Slides: Introducing the new ClusterControl 1.2.10 for MySQL, MongoDB and Post...
Hadoop for carrier
Hadoop for carrier
Integrating Lucene into a Transactional XML Database
Integrating Lucene into a Transactional XML Database
Improving Hadoop Resiliency and Operational Efficiency with EMC Isilon
Improving Hadoop Resiliency and Operational Efficiency with EMC Isilon
Architecture_Masking_Delphix.pptx
Architecture_Masking_Delphix.pptx
Zend Products and PHP for IBMi
Zend Products and PHP for IBMi
Mehr von DataWorks Summit
Data Science Crash Course
Data Science Crash Course
DataWorks Summit
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
DataWorks Summit
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
DataWorks Summit
Managing the Dewey Decimal System
Managing the Dewey Decimal System
DataWorks Summit
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
DataWorks Summit
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
DataWorks Summit
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
DataWorks Summit
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
Security Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
DataWorks Summit
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
DataWorks Summit
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
DataWorks Summit
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
DataWorks Summit
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
DataWorks Summit
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
DataWorks Summit
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
DataWorks Summit
Mehr von DataWorks Summit
(20)
Data Science Crash Course
Data Science Crash Course
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Managing the Dewey Decimal System
Managing the Dewey Decimal System
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Security Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Kürzlich hochgeladen
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
Martijn de Jong
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
Safe Software
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Drew Madelung
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
hans926745
How to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
naman860154
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
Gabriella Davis
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
Antenna Manufacturer Coco
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
Evaluating the top large language models.pdf
Evaluating the top large language models.pdf
ChristopherTHyatt
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
HampshireHUG
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
sudhanshuwaghmare1
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
Product Anonymous
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Igalia
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
Radu Cotescu
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
debabhi2
Kürzlich hochgeladen
(20)
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
How to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
Evaluating the top large language models.pdf
Evaluating the top large language models.pdf
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
Greenplum Database on HDFS
1.
Greenplum Database on
HDFS (GOH) Presenter: Lei Chang lei.chang@emc.com © Copyright 2012 EMC Corporation. All rights reserved. 1
2.
Outline
• Introduc/on • Architecture • Features • Performance study © Copyright 2012 EMC Corporation. All rights reserved. 2
3.
EMC Greenplum Unified
Analy/cs Pla@orm © Copyright 2012 EMC Corporation. All rights reserved. 3
4.
GOH use cases
• All customers of Greenplum who want to minimize the amount of duplicate storage that they have to buy for analy/cs – managing scale much easier if you focus on the growth of one pool than having many fragmented pools. • For customers who want the func/onality of GPDB with the generality and storage provided by their HBase store. • Poten/al Ability to plug various storage such as Isilon, Atoms, MapR Filesystem, CloudStore, GPFS, Lustre, PVFS and Ceph to GPDB/Hadoop soQware stack © Copyright 2012 EMC Corporation. All rights reserved. 4
5.
Master host
GPDB Interconnect Segment Segment (Mirror) Segment Segment Segment Segment Segment Segment (Mirror) Segment Segment (Mirror) (Mirror) (Mirror) Segment host Segment host Segment host Segment host Segment host Meta Ops Read/Write Tables in HDFS filespace Namenode B Datanode replication Datanode Datanode Rack1 Rack2 © Copyright 2012 EMC Corporation. All rights reserved. 5
6.
GOH features
• A pluggable storage layer. If a new file system can support the full seman/c of HDFS interface, then the file system can be added as GPDB AO table storage. • ASributed filespace • HDFS filespaces are na/vely supported • Full transac/on support for AO tables on HDFS. • HDFS trunca/on capability to support the transac/on capability of GOH. • HDFS na/ve C interface to eliminate the concurrency limita/on of current java JNI based client. • All current GPDB func/onality: fault tolerance et al. © Copyright 2012 EMC Corporation. All rights reserved. 6
7.
Pluggable storage: user
interface CREATE FUNCTION open_func AS '(' obj_file ',' link_smybol ')' CREATE FILESYSTEM filesystemname [OWNER ownername] ( connect = connect_func, open = open_func, close = close_func, read = read_func, write = write_func, seek = seek_func, ... ) © Copyright 2012 EMC Corporation. All rights reserved. 7
8.
ASributed filespaces
• The number of replicas for the table in the filespace • Whether mirroring is supported for the tables stored in the filespace • Other aSributes… © Copyright 2012 EMC Corporation. All rights reserved. 8
9.
Example SQL
CREATE FILESPACE goh ON HDFS ( 1: 'hdfs://name-‐node/users/changl1/gp-‐data/gohmaster/gpseg-‐1', 2: 'hdfs://name-‐node/users/changl1/gp-‐data/goh/gpseg0', 3: 'hdfs://name-‐node/users/changl1/gp-‐data/goh/gpseg1', ) WITH (NUMREPLICA = 3, MIRRORING = false); © Copyright 2012 EMC Corporation. All rights reserved. 9
10.
Transac/on support
• When a load transac/on is aborted, there will be some garbage data leQ at the end of file. For HDFS like systems, data cannot be truncated or overwriSen. Thus, we need some methods to process the par/al data to support transac/on. – Op/on 1: Load data into a separate HDFS file. Unlimited number of files. – Op/on 2: Use metadata to records the boundary of garbage data, and implements a kind of vacuum mechanism. – Op/on 3: Implement HDFS trunca/on. © Copyright 2012 EMC Corporation. All rights reserved. 10
11.
HDFS C client:
why • libhdfs (Current HDFS c client) is based on JNI. It is difficult to make GOH support a large number of concurrent queries. • Example: – 6 segments on each segment hosts – 50 concurrent queries – each query may have 12 or more QE processes that do scan – there will be about 600 processes that start 600 JVMs to access HDFS. – If each JVM uses 500MB memory, the JVMs will consume 600 * 500M = 300G memory. – Thus naïve usage of libhdfs is not suitable for GOH. Currently we have three op/ons to solve this problem © Copyright 2012 EMC Corporation. All rights reserved. 11
12.
HDFS client: three
op/ons • Op/on 1: use HDFS FUSE. HDFS FUSE introduces some performance overhead. And the scalability is not verified yet. • Op/on 3: implement a webhdfs based C client. webhdfs is based on HTTP. It also introduces some costs. Performance should be benchmarked. Webhdfs based method has several benefits, such as ease to implementa/on and low maintenance cost. • Op/on 2: implement a C RPC interface that directly communicates with NameNode and DataNode. Many changes when the RPC protocol is changed. • Currently, we implemented op/on 2 and op/on 3. © Copyright 2012 EMC Corporation. All rights reserved. 12
13.
HDFS truncate
• API – truncate (DistributedFileSystem) -‐ truncate a file to a specified length – void truncate(Path src, long length) throws IOExcep/on; • Seman/cs – Only single writer/Appender/Truncater is allowed. Users can only call truncate on closed files. – HDFS guarantees the atomicity of a truncate opera/on. That is, it succeeds or fails. It does not leave the file in an undefined state. – Concurrent readers may read content of a file that will be truncated by a concurrent truncate opera/on. But they must be able to read all the data that are not affected by the concurrent truncate opera/on. © Copyright 2012 EMC Corporation. All rights reserved. 13
14.
HDFS truncate implementa/on
(HDFS-‐3107) • Get the lease of the to-‐be-‐truncated file (F) • If truncate is at block boundary – Delete the tail blocks as an atomic opera/on. • If truncate is not at block boundary – Copy the last block (B) of the result file (R) to a temporary file (T). • Otherwise, If truncate is not at block boundary • Remove the tail blocks of file F (including B, B+1, …), concat F and T, get R. • Release the lease for the file © Copyright 2012 EMC Corporation. All rights reserved. 14
15.
Performance study (to
be added) © Copyright 2012 EMC Corporation. All rights reserved. 15
16.
Thank you!
© Copyright 2012 EMC Corporation. All rights reserved. 16