SlideShare ist ein Scribd-Unternehmen logo
1 von 48
Downloaden Sie, um offline zu lesen
© Copyright 2019 Pivotal Software, Inc. All rights Reserved.
2019 9 25
Pivotal
DWH
Pivotal Greenplum
Agenda
●
Ø Pivotal Greenplum
Ø Greenplum - Pivotal Greenplum 6
● DWH Pivotal Greenplum
Ø
Ø DWH
Ø
●
3© Copyright 2019 Pivotal. All rights reserved.
Pivotal Greenplum
• Pivotal Data Suite (CPU )
•
•
• ( ) K8s
• MPP DB
•
• ( etc..)
•
•
•
4© Copyright 2019 Pivotal. All rights reserved.
Pivotal Greenplum
MPP (Massively Parallel Processing)
... ...
x 2
x 2
SQL
SQL gNet
5© Copyright 2019 Pivotal. All rights reserved.
CPU
I/O
CPU CPU CPU CPU
CPU CPU CPU CPU CPU
I/O I/O
HW
RDB DB
6© Copyright 2019 Pivotal. All rights reserved.
( 1/2)
256GB RAM
1.8TB 10Krpm SAS HDD 6 RAID5
(4 + 1 + 1 )
256GB RAM256GB RAM 256GB RAM 256GB RAM
256GB RAM
1.8TB 10Krpm SAS HDD 6 RAID5
(4 + 1 + 1 )
#1
(10Gbps x 52 )
#2
(10Gbps x 52 )
#1 #2 #3 #4
Intel E5-2680v3
2CPU,24Core
Intel E5-2680v3
2CPU,24Core
Intel E5-2680v3
2CPU,24Core
Intel E5-2680v3
2CPU,24Core
Intel E5-2680v3
2CPU,24Core
Intel E5-2680v3
2CPU,24Core
1.8TB 10Krpm SAS HDD 12 RAID5
(10 + 1 + 1 )
1.8TB 10Krpm SAS HDD 12 RAID5
(10 + 1 + 1 )
1.8TB 10Krpm SAS HDD 12 RAID5
(10 + 1 + 1 )
1.8TB 10Krpm SAS HDD 12 RAID5
(10 + 1 + 1 )
1.8TB 10Krpm SAS HDD 12 RAID5
(10 + 1 + 1 )
1.8TB 10Krpm SAS HDD 12 RAID5
(10 + 1 + 1 )
1.8TB 10Krpm SAS HDD 12 RAID5
(10 + 1 + 1 )
1.8TB 10Krpm SAS HDD 12 RAID5
(10 + 1 + 1 )
mLAG
7© Copyright 2019 Pivotal. All rights reserved.
( 2/2)
64GB RAM
300GB 10Krpm SAS HDD 6 RAID5
(4 + 1 + 1 )
64GB RAM64GB RAM 64GB RAM 64GB RAM
64GB RAM
300GB 10Krpm SAS HDD 6 RAID5
(4 + 1 + 1 )
#1
(10Gbps x 52 )
#2
(10Gbps x 52 )
#1 #2 #3 #4
Intel E5-2660
2CPU,16Core
Intel E5-2660
2CPU,16Core
Intel E5-2660
2CPU,16Core
Intel E5-2660
2CPU,16Core
Intel E5-2660
2CPU,16Core
Intel E5-2660
2CPU,16Core
300GB 10Krpm SAS HDD 12 RAID5
(10 + 1 + 1 )
300GB 10Krpm SAS HDD 12 RAID5
(10 + 1 + 1 )
300GB 10Krpm SAS HDD 12 RAID5
(10 + 1 + 1 )
300GB 10Krpm SAS HDD 12 RAID5
(10 + 1 + 1 )
300GB 10Krpm SAS HDD 12 RAID5
(10 + 1 + 1 )
300GB 10Krpm SAS HDD 12 RAID5
(10 + 1 + 1 )
300GB 10Krpm SAS HDD 12 RAID5
(10 + 1 + 1 )
300GB 10Krpm SAS HDD 12 RAID5
(10 + 1 + 1 )
HA
Bonding
#1 #1
#9 #9
#17 #17
5 #25 #25
mLAG
OLTP OLAP
Greenplum
v1 v4 v5 v6
2003 - 2009 2010 2015-20182013 2019
BI
Cover w/ Image
DWH
• PB
• :
• SLA
•
•
• OSS
#1 DWH
September 4, 2019: Now Generally Available
Greenplum 6 Postgres
v8.4 – 2314 commits
v9.0 – 1859 commits
v9.1 – 2035 commits
v9.2 – 1945 commits
v9.3 – 1603 commits
v9.4 – 1964 commits
TOTAL: 11,720 Commits Merged
Code Quality via Open Source
Optimized for Big Data in Greenplum
“Customers
frequently called out
the open-source
alignment with
PostgreSQL as a
strong and cost-
effective positive”
-- Gartner MQ 2019
Greenplum 6 OLTP
70
● OLTP
● OLTP
● 24,448 TPS for Update transactions in GP6
● 46,570 TPS for Single Row Insert in GP6
● 140,000 TPS for Select Only Query in GP6
●
Real world analytical
database and data
warehouse use cases
require a mixed
workload of long and
short queries as well
as updates and
deletes
“Replicated”
“DISTRIBUTED REPLICATED”
● Greenplum
● /
● Replicated
REPLICATED
DIMENSION TABLES
FOR FAST LOCAL JOIN
Greenplum 6 Replicated Tables
create table table_replicated (a int , b text)
distributed replicated;
insert into table_replicated
select id, 'val ' || id
from generate_series (1,10000) id;
select pg_relation_size('table_replicated');
pg_relation_size
------------------
917504
create table table_non_replicated (a int , b text)
distributed randomly;
insert into table_non_replicated
select id, 'val ' || id
from generate_series (1,10000) id;
select pg_relation_size('table_non_replicated');
pg_relation_size
------------------
458752
With Non Replicated table With Replicated Tables
Size is multiplied by the
number of primaries
select gp_segment_id, count(*) from table_replicated
group by 1;
ERROR: column "gp_segment_id" does not exist
LINE 1: select gp_segment_id, count(*) from ...
^
select gp_segment_id, count(*) from
table_non_replicated group by 1;
gp_segment_id | count
---------------+-------
0 | 5011
1 | 4989 The field gp_segment_id doesn't
exist in replicated tables
Greenplum 6 Replicated Tables Query Plan
explain select count(*) from table_fact f inner join table_replicated d on f.a = d.a;
QUERY PLAN
----------------------------------------------------------------------------------------------------
Aggregate (cost=0.00..874.73 rows=1 width=8)
-> Gather Motion 2:1 (slice1; segments: 2) (cost=0.00..874.73 rows=1 width=8)
-> Aggregate (cost=0.00..874.73 rows=1 width=8)
-> Hash Join (cost=0.00..874.73 rows=50000 width=1)
Hash Cond: (table_fact.a = table_replicated.a)
-> Seq Scan on table_fact (cost=0.00..432.15 rows=50000 width=4)
-> Hash (cost=431.23..431.23 rows=10000 width=4)
-> Seq Scan on table_replicated (cost=0.00..431.23 rows=10000 width=4)
Optimizer: PQO version 3.29.0
explain select count(*) from table_fact f inner join table_non_replicated d on f.a = d.a;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
Aggregate (cost=0.00..874.31 rows=1 width=8)
-> Gather Motion 2:1 (slice3; segments: 2) (cost=0.00..874.31 rows=1 width=8)
-> Aggregate (cost=0.00..874.31 rows=1 width=8)
-> Hash Join (cost=0.00..874.31 rows=50000 width=1)
Hash Cond: (table_fact.a = table_non_replicated.a)
-> Redistribute Motion 2:2 (slice1; segments: 2) (cost=0.00..433.15 rows=50000 width=4)
Hash Key: table_fact.a
-> Seq Scan on table_fact (cost=0.00..432.15 rows=50000 width=4)
-> Hash (cost=431.22..431.22 rows=5000 width=4)
-> Redistribute Motion 2:2 (slice2; segments: 2) (cost=0.00..431.22 rows=5000 width=4)
Hash Key: table_non_replicated.a
-> Seq Scan on table_non_replicated (cost=0.00..431.12 rows=5000 width=4)
Optimizer: PQO version 3.29.0
WithNonReplicatedtable
1 slice vs 3 slices
No redistribution
WithReplicatedtable
H/W
● zStandard
● Facebook OSS
●
●
● CREATE TABLE WITH
WITH (compresstype=zstd)
zStandard
Zstd
1-9 (1)
(9)
SQL CTE
Using RECURSIVE,
a WITH query can refer
to its own output
ETL Writable CTE
Data modifying
CTE allows
several different
operations in the
same query
Unlogged :
● WAL Unlogged
● :
● DB
create unlogged table
table_unlogged
(a int , b text)
distributed randomly;
Private CloudBare-Metal Public Cloud
Greenplum Building
Blocks
• The most performant way to
run Greenplum on premise
• Pivotal Blueprint for Dell
reference hardware configs
• Superior price/performance; no
expensive proprietary hardware
• Certified and supported by
Pivotal
Run Greenplum in Any Environment
Greenplum for Kubernetes
Other Kubernetes
(on VMs or not)
Google
Container Engine
Enterprise & Essentials(OSS K8s)
•
• : 100%
•
Public Cloud
Run Greenplum in Any Environment
●
○ AI
Pivotal Greenplum
○ ( )
●
○
○
○
○
●
○
○ DR AZ
○ HA
● 1
●
●
( )
●
● 5
○
●
● pgBouncer
DB
● gpsnap/gpcronsnap -
●
IaaS
●
●
●
Azure Resource
Group
Deployment
AWS
CloudFormation
GCP
Deployment
Manager
V
M
V
M
V
M
V
M
V
M
X
Data
Volume
Snapshot Restore
Run Greenplum in Any Environment
Greenplum for Kubernetes
Other Kubernetes
(on VMs or not)
Google
Container Engine
Enterprise & Essentials(OSS K8s)
●
● K8s
● K8s
●
day-2
PKS
Container
Operator
Bringing Cloud Databases On-Premises
● Greenplum
(Postgres) / Pod /
VM(vMotion)
● Greenplum
●
●
●
●
K8s worker 1 K8s worker n
PKS/K8s cluster
pod pod
K8s worker VMs: 8 to 32 GB
● Greenplum
○
○
● Greenplum
○
Pod
● VM K8s
Greenplum Pod
○ Pod
Persistent Volume 1 . . n
K8s worker 1 K8s worker n
PKS/K8s cluster
pod pod
Pivotal
2km
ATM 24 200
Peter
Pavan
Pivotal 2km ATM 24 200
Peter Pavan
drop function if exists get_people(text,text,integer,integer,float,float);
CREATE FUNCTION get_people(text,text,integer,integer,float,float) RETURNS integer
AS $$
declare
linkchk integer; v1 record; v2 record;
begin
execute 'truncate table results;';
for v1 in select distinct a.id,a.firstname,a.lastname,amount,tran_date,c.lat,c.lng,address,a.description,d.score from people a,transactions b,location c,
(SELECT w.id, q.score FROM people w, gptext.search(TABLE(SELECT 1 SCATTER BY 1), 'gpadmin.public.people' , 'Pivotal', null) q
WHERE (q.id::integer) = w.id order by 2 desc) d
where soundex(firstname)=soundex($1) and a.id=b.id and amount > $3 and (extract(epoch from tran_date) - extract(epoch from now()))/3600 < $4
and st_distance_sphere(st_makepoint($5, $6),st_makepoint(c.lng, c.lat))/1000.0 <= 2.0 and b.locid=c.locid and a.id=d.id
loop
for v2 in select distinct a.id,a.firstname,a.lastname,amount,tran_date,c.lat,c.lng,address,a.description,d.score from people a,transactions b,location c,
(SELECT w.id, q.score FROM people w, gptext.search(TABLE(SELECT 1 SCATTER BY 1), 'gpadmin.public.people' , 'Pivotal', null) q
WHERE (q.id::integer) = w.id order by 2 desc) d
where soundex(firstname)=soundex($2) and a.id=b.id and amount > $3 and (extract(epoch from tran_date) - extract(epoch from now()))/3600 < $4
and st_distance_sphere(st_makepoint($5, $6),st_makepoint(c.lng, c.lat))/1000.0 <= 2.0 and b.locid=c.locid and a.id=d.id
loop
execute 'DROP TABLE IF EXISTS out, out_summary;';
execute 'SELECT madlib.graph_bfs(''people'',''id'',''links'',NULL,'||v1.id||',''out'');' ;
select 1 into linkchk from out where dist=1 and id=v2.id;
if linkchk is not null then
insert into results values (v1.id,v1.firstname,v1.lastname,v1.amount,v1.tran_date,v1.lat,v1.lng,v1.address,v1.description,v1.score);
insert into results values (v2.id,v2.firstname,v2.lastname,v2.amount,v2.tran_date,v2.lat,v2.lng,v2.address,v2.description,v2.score);
end if;
end loop;
end loop;
return 0;
end
$$ LANGUAGE plpgsql;
-- person1 , person 2, amount, duration in hours, longtitude, latitude (in question)
select get_people('Pavan','Peter',200,24,103.912680, 1.309432) ;
Greenplum POSTGIS functions
st_distance_sphere() and
st_makepoint() calculate distance
between ATM location and
reference lat ,long < 2 KM
GPText.search() function is
used to know if both
people work at ‘Pivotal’
Greenplum and Apache MADlib BFS
search to know if there are direct or
indirect links between people
Greenplum Fuzzy String
Match function Soundex()
to know if people name
sounds like ‘Pavan’ or
‘Peter’
Greenplum Time functions to
calculate difference in amount
withdrawn time < 24 hours
Amount
> $200
“Pivotal
- GPText
Peter
Pavan
- Fuzzy
String Match
- Apache MADlib 2km ATM”
- PostGIS
24 ”
/
200
”
: 3,000+ vs 34
LOAD
customer
data from
HDFS and
put to HIVE
DESCRIPTION
Column needs to
be indexed
SEARCH
IN Column
& WRITE
Result to
HDFS
WRITE
CODE :
Pulling Data
Into Spark
Data Frame
WRITE
CODE :
CHECK
Soundex
WRITE
CODE :
MATCH
SOLR
Result
WRITE
CODE :
GRAPH
LINK
Analysis
WRITE
CODE :
POSTGI
S
Distance
Calculation
WRITE
CODE :
GRAPH
LINK
Analysi
s
WRITE
CODE :
WRITE
RESULTS
TO HIVE
TABLE
“Investigate a crime suspect whose name sounds like ‘Pavan’, who knows Peter directly, who withdraw Peter’s $500 at an ATM
located 2km from Changi yesterday.”
Using a Hadoop Ecosystem: 10 steps, 3000+ Lines of code across 4 different systems
1 2 3 4 5 6 7 8 9 10
Using Greenplum: 1 step, 1 query – 34 Lines of Code
One query – using built-in functions: Soundex (sounds like), NLP (work at same company),
Machine Learning MADlib (know directly), Time (yesterday), PostGIS (within 2km)
Greenplum
Greenplum
BI
Pivotal Greenplum OSS
•
• 50
• Pivotal Greenplum
• Apache
2017 7 :
http://madlib.apach
e.org
Apache MADlib
•
•
• http://lucene.apache
.org/solr/
Apache Solr
•
PostgreSQL
OSS
•
•
• http://postgis.net/
PostGIS
•
•
•
R
• https://www.r-
project.org/
R
•
•
•
•
• https://www.python.
org/
Python
In-DB
• Open source https://github.com/apache/madlib
• Downloads and docs http://madlib.apache.org/
• Wiki
https://cwiki.apache.org/confluence/display/MADLIB/
Apache MADlib: SQL
Apache
PostgreSQL &
Greenplum
Functions
Data Types and Transformations
Array and Matrix Operations
Matrix Factorization
• Low Rank
• Singular Value Decomposition (SVD)
Norms and Distance Functions
Sparse Vectors
Encoding Categorical Variables
Path Functions
Pivot
Sessionize
Stemming
May 2018
Graph
All Pairs Shortest Path (APSP)
Breadth-First Search
Hyperlink-Induced Topic Search (HITS)
Average Path Length
Closeness Centrality
Graph Diameter
In-Out Degree
PageRank and Personalized PageRank
Single Source Shortest Path (SSSP)
Weakly Connected Components
Model Selection
Cross Validation
Prediction Metrics
Train-Test Split
Statistics
Descriptive Statistics
• Cardinality Estimators
• Correlation and Covariance
• Summary
Inferential Statistics
• Hypothesis Tests
Probability Functions
Supervised Learning
Neural Networks
Support Vector Machines (SVM)
Conditional Random Field (CRF)
Regression Models
• Clustered Variance
• Cox-Proportional Hazards Regression
• Elastic Net Regularization
• Generalized Linear Models
• Linear Regression
• Logistic Regression
• Marginal Effects
• Multinomial Regression
• Naïve Bayes
• Ordinal Regression
• Robust Variance
Tree Methods
• Decision Tree
• Random Forest
Time Series Analysis
• ARIMA
Unsupervised Learning
Association Rules (Apriori)
Clustering (k-Means)
Principal Component Analysis (PCA)
Topic Modelling (Latent Dirichlet Allocation)
Utility Functions
Columns to Vector
Conjugate Gradient
Linear Solvers
• Dense Linear Systems
• Sparse Linear Systems
Mini-Batching
PMML Export
Term Frequency for Text
Vector to Columns
Nearest Neighbors
• k-Nearest Neighbors
Sampling
Balanced
Random
Stratified
Greenplum
Standby
Master
…
Master
Host
SQL
Interconnect
Segment
Host
Node1
Segment
Host
Node2
Segment
Host
Node3
Segment
Host
Node
N
GPU N
…
GPU 1 GPU N
…
GPU 1 GPU N
…
GPU 1
…
GPU N
…
GPU 1
In-Database
Functions
Machine learning
&
statistics
&
math
&
graph
&
utilities
MassivelyParallelProcessing
Best of both worlds: GPU-
focused and CPU-focused
data science workloads
● Unified platform for full
range of data science
workloads
● Higher productivity due
to no data movement
● Persistent data storage
and management
integrated with core
machine learning & API
compute engine
Supporting the full spectrum of data science workloads:
Data preparation, feature generation, machine learning, geospatial, deep learning, etc
Data
Types
Structured
Data
Unstructured
Data
Geographic
Data
Real Time
Data
Natural
Language
Data
Time Series
Data
Event Data
Network
Data
Linked Data
?
40
●
●
●
●
In-Memory
Database
RDBMS
dataData Lake
HOT
DATA
WARM
DATA
COLD
DATA
41
Platform Extension Framework (PXF)
PXF
44© Copyright 2019 Pivotal. All rights reserved.
(GIS)
: (NICT)
• ( )
• Pivotal Greenplum
• PostGIS: OSS
• Apache MADlib:
OSS
•
•
•
•
•
Hadoop
GPText
Pivotal Greenplum
ColdHotWarm
DataTemperature
PIVOTAL
GEMFIRE
PIVOTAL
GREENPLUM
(Data Warehouse)
PIVOTAL
GREENPLUM
Structured Data
JDBC, OBBC
SQL
ANSI SQL
RDBMS
SparkGemFireHDFS
JSON, Apache AVRO, Apache Parquet and XML
Teradata SQL
DB SQL
Apache MADlib
/ /
Python. R,
Java, Perl, C
Apache SOLR PostGIS
Custom Apps BI / Reporting Machine Learning AI
Pivotal
Greenplum
KafkaETL
Spring
Cloud
Data Flow
(MPP)
PostgreSQL
(GPORCA)
Command
Center
SQL
(Hyper-Q)
IT
● Pivotal Greenplum 3
Ø
●
Ø &
Ø PostgreSQL DWH
● Greenplum 6
● DWH Pivotal Greenplum
●
● TW: @greenplummy
● connpass: https://pivotal-japan.connpass.com/
© Copyright 2019 Pivotal Software, Inc. All rights Reserved.
Thank You

Weitere ähnliche Inhalte

Was ist angesagt?

PostgreSQLの範囲型と排他制約
PostgreSQLの範囲型と排他制約PostgreSQLの範囲型と排他制約
PostgreSQLの範囲型と排他制約
Akio Ishida
 
Inside vacuum - 第一回PostgreSQLプレ勉強会
Inside vacuum - 第一回PostgreSQLプレ勉強会Inside vacuum - 第一回PostgreSQLプレ勉強会
Inside vacuum - 第一回PostgreSQLプレ勉強会
Masahiko Sawada
 

Was ist angesagt? (20)

クラウドを最大限活用するinfrastructure as codeを考えよう
クラウドを最大限活用するinfrastructure as codeを考えようクラウドを最大限活用するinfrastructure as codeを考えよう
クラウドを最大限活用するinfrastructure as codeを考えよう
 
PostgreSQLの範囲型と排他制約
PostgreSQLの範囲型と排他制約PostgreSQLの範囲型と排他制約
PostgreSQLの範囲型と排他制約
 
Your first ClickHouse data warehouse
Your first ClickHouse data warehouseYour first ClickHouse data warehouse
Your first ClickHouse data warehouse
 
PostgreSQLモニタリングの基本とNTTデータが追加したモニタリング新機能(Open Source Conference 2021 Online F...
PostgreSQLモニタリングの基本とNTTデータが追加したモニタリング新機能(Open Source Conference 2021 Online F...PostgreSQLモニタリングの基本とNTTデータが追加したモニタリング新機能(Open Source Conference 2021 Online F...
PostgreSQLモニタリングの基本とNTTデータが追加したモニタリング新機能(Open Source Conference 2021 Online F...
 
アーキテクチャから理解するPostgreSQLのレプリケーション
アーキテクチャから理解するPostgreSQLのレプリケーションアーキテクチャから理解するPostgreSQLのレプリケーション
アーキテクチャから理解するPostgreSQLのレプリケーション
 
[db tech showcase Tokyo 2014] B26: PostgreSQLを拡張してみよう by SRA OSS, Inc. 日本支社 高塚遥
[db tech showcase Tokyo 2014] B26: PostgreSQLを拡張してみよう  by SRA OSS, Inc. 日本支社 高塚遥[db tech showcase Tokyo 2014] B26: PostgreSQLを拡張してみよう  by SRA OSS, Inc. 日本支社 高塚遥
[db tech showcase Tokyo 2014] B26: PostgreSQLを拡張してみよう by SRA OSS, Inc. 日本支社 高塚遥
 
PostgreSQLのリカバリ超入門(もしくはWAL、CHECKPOINT、オンラインバックアップの仕組み)
PostgreSQLのリカバリ超入門(もしくはWAL、CHECKPOINT、オンラインバックアップの仕組み)PostgreSQLのリカバリ超入門(もしくはWAL、CHECKPOINT、オンラインバックアップの仕組み)
PostgreSQLのリカバリ超入門(もしくはWAL、CHECKPOINT、オンラインバックアップの仕組み)
 
祝!PostgreSQLレプリケーション10周年!徹底紹介!!
祝!PostgreSQLレプリケーション10周年!徹底紹介!!祝!PostgreSQLレプリケーション10周年!徹底紹介!!
祝!PostgreSQLレプリケーション10周年!徹底紹介!!
 
PostgreSQL: XID周回問題に潜む別の問題
PostgreSQL: XID周回問題に潜む別の問題PostgreSQL: XID周回問題に潜む別の問題
PostgreSQL: XID周回問題に潜む別の問題
 
RDB技術者のためのNoSQLガイド NoSQLの必要性と位置づけ
RDB技術者のためのNoSQLガイド NoSQLの必要性と位置づけRDB技術者のためのNoSQLガイド NoSQLの必要性と位置づけ
RDB技術者のためのNoSQLガイド NoSQLの必要性と位置づけ
 
PostgreSQLレプリケーション10周年!徹底紹介!(PostgreSQL Conference Japan 2019講演資料)
PostgreSQLレプリケーション10周年!徹底紹介!(PostgreSQL Conference Japan 2019講演資料)PostgreSQLレプリケーション10周年!徹底紹介!(PostgreSQL Conference Japan 2019講演資料)
PostgreSQLレプリケーション10周年!徹底紹介!(PostgreSQL Conference Japan 2019講演資料)
 
Ingress on Azure Kubernetes Service
Ingress on Azure Kubernetes ServiceIngress on Azure Kubernetes Service
Ingress on Azure Kubernetes Service
 
Rescale で Singularity を使ってみよう!
Rescale で Singularity を使ってみよう!Rescale で Singularity を使ってみよう!
Rescale で Singularity を使ってみよう!
 
Inside vacuum - 第一回PostgreSQLプレ勉強会
Inside vacuum - 第一回PostgreSQLプレ勉強会Inside vacuum - 第一回PostgreSQLプレ勉強会
Inside vacuum - 第一回PostgreSQLプレ勉強会
 
Digdagによる大規模データ処理の自動化とエラー処理
Digdagによる大規模データ処理の自動化とエラー処理Digdagによる大規模データ処理の自動化とエラー処理
Digdagによる大規模データ処理の自動化とエラー処理
 
40分でわかるHadoop徹底入門 (Cloudera World Tokyo 2014 講演資料)
40分でわかるHadoop徹底入門 (Cloudera World Tokyo 2014 講演資料) 40分でわかるHadoop徹底入門 (Cloudera World Tokyo 2014 講演資料)
40分でわかるHadoop徹底入門 (Cloudera World Tokyo 2014 講演資料)
 
PostgreSQL 14 モニタリング新機能紹介(PostgreSQL カンファレンス #24、2021/06/08)
PostgreSQL 14 モニタリング新機能紹介(PostgreSQL カンファレンス #24、2021/06/08)PostgreSQL 14 モニタリング新機能紹介(PostgreSQL カンファレンス #24、2021/06/08)
PostgreSQL 14 モニタリング新機能紹介(PostgreSQL カンファレンス #24、2021/06/08)
 
YugabyteDBを使ってみよう(NewSQL/分散SQLデータベースよろず勉強会 #1 発表資料)
YugabyteDBを使ってみよう(NewSQL/分散SQLデータベースよろず勉強会 #1 発表資料)YugabyteDBを使ってみよう(NewSQL/分散SQLデータベースよろず勉強会 #1 発表資料)
YugabyteDBを使ってみよう(NewSQL/分散SQLデータベースよろず勉強会 #1 発表資料)
 
AlloyDBを触ってみた!(第33回PostgreSQLアンカンファレンス@オンライン 発表資料)
AlloyDBを触ってみた!(第33回PostgreSQLアンカンファレンス@オンライン 発表資料)AlloyDBを触ってみた!(第33回PostgreSQLアンカンファレンス@オンライン 発表資料)
AlloyDBを触ってみた!(第33回PostgreSQLアンカンファレンス@オンライン 発表資料)
 
PostgreSQLの統計情報について(第26回PostgreSQLアンカンファレンス@オンライン 発表資料)
PostgreSQLの統計情報について(第26回PostgreSQLアンカンファレンス@オンライン 発表資料)PostgreSQLの統計情報について(第26回PostgreSQLアンカンファレンス@オンライン 発表資料)
PostgreSQLの統計情報について(第26回PostgreSQLアンカンファレンス@オンライン 発表資料)
 

Ähnlich wie クラウドDWHとしても進化を続けるPivotal Greenplumご紹介

John Melesky - Federating Queries Using Postgres FDW @ Postgres Open
John Melesky - Federating Queries Using Postgres FDW @ Postgres OpenJohn Melesky - Federating Queries Using Postgres FDW @ Postgres Open
John Melesky - Federating Queries Using Postgres FDW @ Postgres Open
PostgresOpen
 

Ähnlich wie クラウドDWHとしても進化を続けるPivotal Greenplumご紹介 (20)

Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
 
Apache Cassandra at Macys
Apache Cassandra at MacysApache Cassandra at Macys
Apache Cassandra at Macys
 
PostgreSQL 9.6 새 기능 소개
PostgreSQL 9.6 새 기능 소개PostgreSQL 9.6 새 기능 소개
PostgreSQL 9.6 새 기능 소개
 
[Pgday.Seoul 2019] Citus를 이용한 분산 데이터베이스
[Pgday.Seoul 2019] Citus를 이용한 분산 데이터베이스[Pgday.Seoul 2019] Citus를 이용한 분산 데이터베이스
[Pgday.Seoul 2019] Citus를 이용한 분산 데이터베이스
 
Scaling PostgreSQL With GridSQL
Scaling PostgreSQL With GridSQLScaling PostgreSQL With GridSQL
Scaling PostgreSQL With GridSQL
 
20170602_OSSummit_an_intelligent_storage
20170602_OSSummit_an_intelligent_storage20170602_OSSummit_an_intelligent_storage
20170602_OSSummit_an_intelligent_storage
 
What’s new in 9.6, by PostgreSQL contributor
What’s new in 9.6, by PostgreSQL contributorWhat’s new in 9.6, by PostgreSQL contributor
What’s new in 9.6, by PostgreSQL contributor
 
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
PL/CUDA - Fusion of HPC Grade Power with In-Database AnalyticsPL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
 
John Melesky - Federating Queries Using Postgres FDW @ Postgres Open
John Melesky - Federating Queries Using Postgres FDW @ Postgres OpenJohn Melesky - Federating Queries Using Postgres FDW @ Postgres Open
John Melesky - Federating Queries Using Postgres FDW @ Postgres Open
 
pgconfasia2016 plcuda en
pgconfasia2016 plcuda enpgconfasia2016 plcuda en
pgconfasia2016 plcuda en
 
20171206 PGconf.ASIA LT gstore_fdw
20171206 PGconf.ASIA LT gstore_fdw20171206 PGconf.ASIA LT gstore_fdw
20171206 PGconf.ASIA LT gstore_fdw
 
Rob Sullivan at Heroku's Waza 2013: Your Database -- A Story of Indifference
Rob Sullivan at Heroku's Waza 2013: Your Database -- A Story of IndifferenceRob Sullivan at Heroku's Waza 2013: Your Database -- A Story of Indifference
Rob Sullivan at Heroku's Waza 2013: Your Database -- A Story of Indifference
 
RAPIDS: ускоряем Pandas и scikit-learn на GPU Павел Клеменков, NVidia
RAPIDS: ускоряем Pandas и scikit-learn на GPU  Павел Клеменков, NVidiaRAPIDS: ускоряем Pandas и scikit-learn на GPU  Павел Клеменков, NVidia
RAPIDS: ускоряем Pandas и scikit-learn на GPU Павел Клеменков, NVidia
 
Explain this!
Explain this!Explain this!
Explain this!
 
String Comparison Surprises: Did Postgres lose my data?
String Comparison Surprises: Did Postgres lose my data?String Comparison Surprises: Did Postgres lose my data?
String Comparison Surprises: Did Postgres lose my data?
 
SequoiaDB Distributed Relational Database
SequoiaDB Distributed Relational DatabaseSequoiaDB Distributed Relational Database
SequoiaDB Distributed Relational Database
 
Spark Streaming Tips for Devs and Ops
Spark Streaming Tips for Devs and OpsSpark Streaming Tips for Devs and Ops
Spark Streaming Tips for Devs and Ops
 
Spark Streaming Tips for Devs and Ops by Fran perez y federico fernández
Spark Streaming Tips for Devs and Ops by Fran perez y federico fernándezSpark Streaming Tips for Devs and Ops by Fran perez y federico fernández
Spark Streaming Tips for Devs and Ops by Fran perez y federico fernández
 
20160407_GTC2016_PgSQL_In_Place
20160407_GTC2016_PgSQL_In_Place20160407_GTC2016_PgSQL_In_Place
20160407_GTC2016_PgSQL_In_Place
 
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
 

Kürzlich hochgeladen

Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
amitlee9823
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
amitlee9823
 
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
amitlee9823
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
amitlee9823
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
gajnagarg
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
gajnagarg
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
amitlee9823
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
only4webmaster01
 
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
gajnagarg
 
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night StandCall Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 

Kürzlich hochgeladen (20)

Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning Approach
 
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
 
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night StandCall Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
 

クラウドDWHとしても進化を続けるPivotal Greenplumご紹介

  • 1. © Copyright 2019 Pivotal Software, Inc. All rights Reserved. 2019 9 25 Pivotal DWH Pivotal Greenplum
  • 2. Agenda ● Ø Pivotal Greenplum Ø Greenplum - Pivotal Greenplum 6 ● DWH Pivotal Greenplum Ø Ø DWH Ø ●
  • 3. 3© Copyright 2019 Pivotal. All rights reserved. Pivotal Greenplum • Pivotal Data Suite (CPU ) • • • ( ) K8s • MPP DB • • ( etc..) • • •
  • 4. 4© Copyright 2019 Pivotal. All rights reserved. Pivotal Greenplum MPP (Massively Parallel Processing) ... ... x 2 x 2 SQL SQL gNet
  • 5. 5© Copyright 2019 Pivotal. All rights reserved. CPU I/O CPU CPU CPU CPU CPU CPU CPU CPU CPU I/O I/O HW RDB DB
  • 6. 6© Copyright 2019 Pivotal. All rights reserved. ( 1/2) 256GB RAM 1.8TB 10Krpm SAS HDD 6 RAID5 (4 + 1 + 1 ) 256GB RAM256GB RAM 256GB RAM 256GB RAM 256GB RAM 1.8TB 10Krpm SAS HDD 6 RAID5 (4 + 1 + 1 ) #1 (10Gbps x 52 ) #2 (10Gbps x 52 ) #1 #2 #3 #4 Intel E5-2680v3 2CPU,24Core Intel E5-2680v3 2CPU,24Core Intel E5-2680v3 2CPU,24Core Intel E5-2680v3 2CPU,24Core Intel E5-2680v3 2CPU,24Core Intel E5-2680v3 2CPU,24Core 1.8TB 10Krpm SAS HDD 12 RAID5 (10 + 1 + 1 ) 1.8TB 10Krpm SAS HDD 12 RAID5 (10 + 1 + 1 ) 1.8TB 10Krpm SAS HDD 12 RAID5 (10 + 1 + 1 ) 1.8TB 10Krpm SAS HDD 12 RAID5 (10 + 1 + 1 ) 1.8TB 10Krpm SAS HDD 12 RAID5 (10 + 1 + 1 ) 1.8TB 10Krpm SAS HDD 12 RAID5 (10 + 1 + 1 ) 1.8TB 10Krpm SAS HDD 12 RAID5 (10 + 1 + 1 ) 1.8TB 10Krpm SAS HDD 12 RAID5 (10 + 1 + 1 ) mLAG
  • 7. 7© Copyright 2019 Pivotal. All rights reserved. ( 2/2) 64GB RAM 300GB 10Krpm SAS HDD 6 RAID5 (4 + 1 + 1 ) 64GB RAM64GB RAM 64GB RAM 64GB RAM 64GB RAM 300GB 10Krpm SAS HDD 6 RAID5 (4 + 1 + 1 ) #1 (10Gbps x 52 ) #2 (10Gbps x 52 ) #1 #2 #3 #4 Intel E5-2660 2CPU,16Core Intel E5-2660 2CPU,16Core Intel E5-2660 2CPU,16Core Intel E5-2660 2CPU,16Core Intel E5-2660 2CPU,16Core Intel E5-2660 2CPU,16Core 300GB 10Krpm SAS HDD 12 RAID5 (10 + 1 + 1 ) 300GB 10Krpm SAS HDD 12 RAID5 (10 + 1 + 1 ) 300GB 10Krpm SAS HDD 12 RAID5 (10 + 1 + 1 ) 300GB 10Krpm SAS HDD 12 RAID5 (10 + 1 + 1 ) 300GB 10Krpm SAS HDD 12 RAID5 (10 + 1 + 1 ) 300GB 10Krpm SAS HDD 12 RAID5 (10 + 1 + 1 ) 300GB 10Krpm SAS HDD 12 RAID5 (10 + 1 + 1 ) 300GB 10Krpm SAS HDD 12 RAID5 (10 + 1 + 1 ) HA Bonding #1 #1 #9 #9 #17 #17 5 #25 #25 mLAG
  • 9. Greenplum v1 v4 v5 v6 2003 - 2009 2010 2015-20182013 2019 BI
  • 10. Cover w/ Image DWH • PB • : • SLA • • • OSS #1 DWH
  • 11. September 4, 2019: Now Generally Available
  • 12. Greenplum 6 Postgres v8.4 – 2314 commits v9.0 – 1859 commits v9.1 – 2035 commits v9.2 – 1945 commits v9.3 – 1603 commits v9.4 – 1964 commits TOTAL: 11,720 Commits Merged Code Quality via Open Source Optimized for Big Data in Greenplum “Customers frequently called out the open-source alignment with PostgreSQL as a strong and cost- effective positive” -- Gartner MQ 2019
  • 13. Greenplum 6 OLTP 70 ● OLTP ● OLTP ● 24,448 TPS for Update transactions in GP6 ● 46,570 TPS for Single Row Insert in GP6 ● 140,000 TPS for Select Only Query in GP6 ● Real world analytical database and data warehouse use cases require a mixed workload of long and short queries as well as updates and deletes
  • 14. “Replicated” “DISTRIBUTED REPLICATED” ● Greenplum ● / ● Replicated REPLICATED DIMENSION TABLES FOR FAST LOCAL JOIN
  • 15. Greenplum 6 Replicated Tables create table table_replicated (a int , b text) distributed replicated; insert into table_replicated select id, 'val ' || id from generate_series (1,10000) id; select pg_relation_size('table_replicated'); pg_relation_size ------------------ 917504 create table table_non_replicated (a int , b text) distributed randomly; insert into table_non_replicated select id, 'val ' || id from generate_series (1,10000) id; select pg_relation_size('table_non_replicated'); pg_relation_size ------------------ 458752 With Non Replicated table With Replicated Tables Size is multiplied by the number of primaries select gp_segment_id, count(*) from table_replicated group by 1; ERROR: column "gp_segment_id" does not exist LINE 1: select gp_segment_id, count(*) from ... ^ select gp_segment_id, count(*) from table_non_replicated group by 1; gp_segment_id | count ---------------+------- 0 | 5011 1 | 4989 The field gp_segment_id doesn't exist in replicated tables
  • 16. Greenplum 6 Replicated Tables Query Plan explain select count(*) from table_fact f inner join table_replicated d on f.a = d.a; QUERY PLAN ---------------------------------------------------------------------------------------------------- Aggregate (cost=0.00..874.73 rows=1 width=8) -> Gather Motion 2:1 (slice1; segments: 2) (cost=0.00..874.73 rows=1 width=8) -> Aggregate (cost=0.00..874.73 rows=1 width=8) -> Hash Join (cost=0.00..874.73 rows=50000 width=1) Hash Cond: (table_fact.a = table_replicated.a) -> Seq Scan on table_fact (cost=0.00..432.15 rows=50000 width=4) -> Hash (cost=431.23..431.23 rows=10000 width=4) -> Seq Scan on table_replicated (cost=0.00..431.23 rows=10000 width=4) Optimizer: PQO version 3.29.0 explain select count(*) from table_fact f inner join table_non_replicated d on f.a = d.a; QUERY PLAN --------------------------------------------------------------------------------------------------------------------- Aggregate (cost=0.00..874.31 rows=1 width=8) -> Gather Motion 2:1 (slice3; segments: 2) (cost=0.00..874.31 rows=1 width=8) -> Aggregate (cost=0.00..874.31 rows=1 width=8) -> Hash Join (cost=0.00..874.31 rows=50000 width=1) Hash Cond: (table_fact.a = table_non_replicated.a) -> Redistribute Motion 2:2 (slice1; segments: 2) (cost=0.00..433.15 rows=50000 width=4) Hash Key: table_fact.a -> Seq Scan on table_fact (cost=0.00..432.15 rows=50000 width=4) -> Hash (cost=431.22..431.22 rows=5000 width=4) -> Redistribute Motion 2:2 (slice2; segments: 2) (cost=0.00..431.22 rows=5000 width=4) Hash Key: table_non_replicated.a -> Seq Scan on table_non_replicated (cost=0.00..431.12 rows=5000 width=4) Optimizer: PQO version 3.29.0 WithNonReplicatedtable 1 slice vs 3 slices No redistribution WithReplicatedtable
  • 17. H/W ● zStandard ● Facebook OSS ● ● ● CREATE TABLE WITH WITH (compresstype=zstd)
  • 19. SQL CTE Using RECURSIVE, a WITH query can refer to its own output
  • 20. ETL Writable CTE Data modifying CTE allows several different operations in the same query
  • 21. Unlogged : ● WAL Unlogged ● : ● DB create unlogged table table_unlogged (a int , b text) distributed randomly;
  • 22.
  • 23. Private CloudBare-Metal Public Cloud Greenplum Building Blocks • The most performant way to run Greenplum on premise • Pivotal Blueprint for Dell reference hardware configs • Superior price/performance; no expensive proprietary hardware • Certified and supported by Pivotal Run Greenplum in Any Environment Greenplum for Kubernetes Other Kubernetes (on VMs or not) Google Container Engine Enterprise & Essentials(OSS K8s) • • : 100% •
  • 24. Public Cloud Run Greenplum in Any Environment
  • 25. ● ○ AI Pivotal Greenplum ○ ( ) ● ○ ○ ○ ○ ● ○ ○ DR AZ ○ HA
  • 26. ● 1 ● ● ( ) ● ● 5 ○ ● ● pgBouncer DB ● gpsnap/gpcronsnap - ● IaaS ● ● ● Azure Resource Group Deployment AWS CloudFormation GCP Deployment Manager V M V M V M V M V M X Data Volume Snapshot Restore
  • 27. Run Greenplum in Any Environment Greenplum for Kubernetes Other Kubernetes (on VMs or not) Google Container Engine Enterprise & Essentials(OSS K8s)
  • 29. ● Greenplum (Postgres) / Pod / VM(vMotion) ● Greenplum ● ● ● ● K8s worker 1 K8s worker n PKS/K8s cluster pod pod K8s worker VMs: 8 to 32 GB
  • 30. ● Greenplum ○ ○ ● Greenplum ○ Pod ● VM K8s Greenplum Pod ○ Pod Persistent Volume 1 . . n K8s worker 1 K8s worker n PKS/K8s cluster pod pod
  • 32. Pivotal 2km ATM 24 200 Peter Pavan drop function if exists get_people(text,text,integer,integer,float,float); CREATE FUNCTION get_people(text,text,integer,integer,float,float) RETURNS integer AS $$ declare linkchk integer; v1 record; v2 record; begin execute 'truncate table results;'; for v1 in select distinct a.id,a.firstname,a.lastname,amount,tran_date,c.lat,c.lng,address,a.description,d.score from people a,transactions b,location c, (SELECT w.id, q.score FROM people w, gptext.search(TABLE(SELECT 1 SCATTER BY 1), 'gpadmin.public.people' , 'Pivotal', null) q WHERE (q.id::integer) = w.id order by 2 desc) d where soundex(firstname)=soundex($1) and a.id=b.id and amount > $3 and (extract(epoch from tran_date) - extract(epoch from now()))/3600 < $4 and st_distance_sphere(st_makepoint($5, $6),st_makepoint(c.lng, c.lat))/1000.0 <= 2.0 and b.locid=c.locid and a.id=d.id loop for v2 in select distinct a.id,a.firstname,a.lastname,amount,tran_date,c.lat,c.lng,address,a.description,d.score from people a,transactions b,location c, (SELECT w.id, q.score FROM people w, gptext.search(TABLE(SELECT 1 SCATTER BY 1), 'gpadmin.public.people' , 'Pivotal', null) q WHERE (q.id::integer) = w.id order by 2 desc) d where soundex(firstname)=soundex($2) and a.id=b.id and amount > $3 and (extract(epoch from tran_date) - extract(epoch from now()))/3600 < $4 and st_distance_sphere(st_makepoint($5, $6),st_makepoint(c.lng, c.lat))/1000.0 <= 2.0 and b.locid=c.locid and a.id=d.id loop execute 'DROP TABLE IF EXISTS out, out_summary;'; execute 'SELECT madlib.graph_bfs(''people'',''id'',''links'',NULL,'||v1.id||',''out'');' ; select 1 into linkchk from out where dist=1 and id=v2.id; if linkchk is not null then insert into results values (v1.id,v1.firstname,v1.lastname,v1.amount,v1.tran_date,v1.lat,v1.lng,v1.address,v1.description,v1.score); insert into results values (v2.id,v2.firstname,v2.lastname,v2.amount,v2.tran_date,v2.lat,v2.lng,v2.address,v2.description,v2.score); end if; end loop; end loop; return 0; end $$ LANGUAGE plpgsql; -- person1 , person 2, amount, duration in hours, longtitude, latitude (in question) select get_people('Pavan','Peter',200,24,103.912680, 1.309432) ; Greenplum POSTGIS functions st_distance_sphere() and st_makepoint() calculate distance between ATM location and reference lat ,long < 2 KM GPText.search() function is used to know if both people work at ‘Pivotal’ Greenplum and Apache MADlib BFS search to know if there are direct or indirect links between people Greenplum Fuzzy String Match function Soundex() to know if people name sounds like ‘Pavan’ or ‘Peter’ Greenplum Time functions to calculate difference in amount withdrawn time < 24 hours Amount > $200 “Pivotal - GPText Peter Pavan - Fuzzy String Match - Apache MADlib 2km ATM” - PostGIS 24 ” / 200 ”
  • 33. : 3,000+ vs 34 LOAD customer data from HDFS and put to HIVE DESCRIPTION Column needs to be indexed SEARCH IN Column & WRITE Result to HDFS WRITE CODE : Pulling Data Into Spark Data Frame WRITE CODE : CHECK Soundex WRITE CODE : MATCH SOLR Result WRITE CODE : GRAPH LINK Analysis WRITE CODE : POSTGI S Distance Calculation WRITE CODE : GRAPH LINK Analysi s WRITE CODE : WRITE RESULTS TO HIVE TABLE “Investigate a crime suspect whose name sounds like ‘Pavan’, who knows Peter directly, who withdraw Peter’s $500 at an ATM located 2km from Changi yesterday.” Using a Hadoop Ecosystem: 10 steps, 3000+ Lines of code across 4 different systems 1 2 3 4 5 6 7 8 9 10 Using Greenplum: 1 step, 1 query – 34 Lines of Code One query – using built-in functions: Soundex (sounds like), NLP (work at same company), Machine Learning MADlib (know directly), Time (yesterday), PostGIS (within 2km)
  • 35. Pivotal Greenplum OSS • • 50 • Pivotal Greenplum • Apache 2017 7 : http://madlib.apach e.org Apache MADlib • • • http://lucene.apache .org/solr/ Apache Solr • PostgreSQL OSS • • • http://postgis.net/ PostGIS • • • R • https://www.r- project.org/ R • • • • • https://www.python. org/ Python
  • 36. In-DB • Open source https://github.com/apache/madlib • Downloads and docs http://madlib.apache.org/ • Wiki https://cwiki.apache.org/confluence/display/MADLIB/ Apache MADlib: SQL Apache PostgreSQL & Greenplum
  • 37. Functions Data Types and Transformations Array and Matrix Operations Matrix Factorization • Low Rank • Singular Value Decomposition (SVD) Norms and Distance Functions Sparse Vectors Encoding Categorical Variables Path Functions Pivot Sessionize Stemming May 2018 Graph All Pairs Shortest Path (APSP) Breadth-First Search Hyperlink-Induced Topic Search (HITS) Average Path Length Closeness Centrality Graph Diameter In-Out Degree PageRank and Personalized PageRank Single Source Shortest Path (SSSP) Weakly Connected Components Model Selection Cross Validation Prediction Metrics Train-Test Split Statistics Descriptive Statistics • Cardinality Estimators • Correlation and Covariance • Summary Inferential Statistics • Hypothesis Tests Probability Functions Supervised Learning Neural Networks Support Vector Machines (SVM) Conditional Random Field (CRF) Regression Models • Clustered Variance • Cox-Proportional Hazards Regression • Elastic Net Regularization • Generalized Linear Models • Linear Regression • Logistic Regression • Marginal Effects • Multinomial Regression • Naïve Bayes • Ordinal Regression • Robust Variance Tree Methods • Decision Tree • Random Forest Time Series Analysis • ARIMA Unsupervised Learning Association Rules (Apriori) Clustering (k-Means) Principal Component Analysis (PCA) Topic Modelling (Latent Dirichlet Allocation) Utility Functions Columns to Vector Conjugate Gradient Linear Solvers • Dense Linear Systems • Sparse Linear Systems Mini-Batching PMML Export Term Frequency for Text Vector to Columns Nearest Neighbors • k-Nearest Neighbors Sampling Balanced Random Stratified
  • 38. Greenplum Standby Master … Master Host SQL Interconnect Segment Host Node1 Segment Host Node2 Segment Host Node3 Segment Host Node N GPU N … GPU 1 GPU N … GPU 1 GPU N … GPU 1 … GPU N … GPU 1 In-Database Functions Machine learning & statistics & math & graph & utilities MassivelyParallelProcessing Best of both worlds: GPU- focused and CPU-focused data science workloads ● Unified platform for full range of data science workloads ● Higher productivity due to no data movement ● Persistent data storage and management integrated with core machine learning & API compute engine Supporting the full spectrum of data science workloads: Data preparation, feature generation, machine learning, geospatial, deep learning, etc
  • 40. ? 40
  • 43. PXF
  • 44. 44© Copyright 2019 Pivotal. All rights reserved. (GIS) : (NICT) • ( ) • Pivotal Greenplum • PostGIS: OSS • Apache MADlib: OSS • • • • •
  • 46. PIVOTAL GREENPLUM Structured Data JDBC, OBBC SQL ANSI SQL RDBMS SparkGemFireHDFS JSON, Apache AVRO, Apache Parquet and XML Teradata SQL DB SQL Apache MADlib / / Python. R, Java, Perl, C Apache SOLR PostGIS Custom Apps BI / Reporting Machine Learning AI Pivotal Greenplum KafkaETL Spring Cloud Data Flow (MPP) PostgreSQL (GPORCA) Command Center SQL (Hyper-Q) IT
  • 47. ● Pivotal Greenplum 3 Ø ● Ø & Ø PostgreSQL DWH ● Greenplum 6 ● DWH Pivotal Greenplum ● ● TW: @greenplummy ● connpass: https://pivotal-japan.connpass.com/
  • 48. © Copyright 2019 Pivotal Software, Inc. All rights Reserved. Thank You