Weitere ähnliche Inhalte Ähnlich wie Postgres.foreign.data.wrappers.2015 (20) Kürzlich hochgeladen (20) Postgres.foreign.data.wrappers.20151. © 2014 EnterpriseDB Corporation. All rights reserved. 1
Postgres for Integrating MongoDB,
Hadoop and Others with FDWs
To hear the recording of this presentation visit Enterprisedb.com >
Resources > Webcasts > On Demand Webcasts
2. © 2014 EnterpriseDB Corporation. All rights reserved. 2
• For most customers, big data
consists of:
• What do I do with it?
− Load it into an database
system to perform analysis
What’s the Big Deal about Big Data?
Web Logs
Sensor Data
Customer
Interactions
Purchase History
Transaction Logs
MongoDB
Hadoop
Cassandra
RedisPostgreSQL
CouchDB
Social Media
3. © 2014 EnterpriseDB Corporation. All rights reserved. 3
In an Ideal World, You Would Have a
Centralized View
Customer
Interactions
OLTP Live
Data
Purchase
History
The Logical Data
Warehouse
Web LogsTransaction
Logs
• We call this the Logical Data Warehouse
4. © 2014 EnterpriseDB Corporation. All rights reserved. 4
• Core PostgeSQL Features to Support ‘Big Data’
− Flexible Datatypes – JSON / JSONB and Key Value Store
− Unlogged tables to improve performance
• Foreign Data Wrappers (FDW)
− Use PostgreSQL as a central interface to connect to other
systems to gather data and issue queries or joins
− Push-down for where and columns improve performance
• Postgres Plus features
− Resource Management to more effectively run mixed
workloads
− EDB*Loader to load data from various sources
EnterpriseDB’s Strategy around Big
Data focuses on 3 Areas
5. © 2014 EnterpriseDB Corporation. All rights reserved. 5
• HSTORE
− Key-value pair
− Simple, fast and easy
− Postgres v 8.2 – pre-dates many NoSQL-only solutions
− Ideal for flat data structures that are sparsely populated
• JSON
− Hierarchical document model
− Introduced in Postgres 9.2, perfected in 9.3
• JSONB
− Binary version of JSON
− Faster, more operators and even more robust
− Postgres 9.4
Postgres’ Provides Flexible Data
Types
6. © 2014 EnterpriseDB Corporation. All rights reserved. 6
• Creating a table with a JSONB field
CREATE TABLE json_data (data JSONB);!
• Simple JSON data element:
{"name": "Apple Phone", "type": "phone", "brand":
"ACME", "price": 200, "available": true,
"warranty_years": 1}!
• Inserting this data element into the table json_data
INSERT INTO json_data (data) VALUES !
!(’ { !"name": "Apple Phone", !
! !"type": "phone", !
! !"brand": "ACME", !
! !"price": 200, !
! !"available": true, !
! !"warranty_years": 1 ! !!
!} ')!
JSON Examples
7. © 2014 EnterpriseDB Corporation. All rights reserved. 7
SELECT DISTINCT !
!data->>'name' as products !
FROM json_data;
!
products !
------------------------------!
Cable TV Basic Service Package!
AC3 Case Black!
Phone Service Basic Plan!
AC3 Phone!
AC3 Case Green!
Phone Service Family Plan!
AC3 Case Red!
AC7 Phone!
A simple query for JSON data
This query does not
return JSON data – it
returns text values
associated with the
key ‘name’
8. © 2014 EnterpriseDB Corporation. All rights reserved. 8
SELECT DISTINCT
product_type,
data->>'brand' as Brand,
data->>'available' as Availability
FROM json_data
JOIN products
ON (products.product_type=json_data.data->>'name')
WHERE json_data.data->>'available'=true;
product_type | brand | availability
---------------------------+-----------+--------------
AC3 Phone | ACME | true
JSON and ANSI SQL Example
ANSI SQL
JSON
No need for programmatic logic to combine SQL and
NoSQL in the application – Postgres does it all
9. © 2014 EnterpriseDB Corporation. All rights reserved. 9
• Every write in PostgreSQL is essentially two writes due
to Write Ahead Log (WAL)
− WAL guarantees durability and support replication
• Unlogged tables are freed from this constraint
− But tables are no longer crash safe!
• Can see good performance gain (~13-17%)
Unlogged Tables Improve
Performance
10. © 2014 EnterpriseDB Corporation. All rights reserved. 10
• Make external data sources look like local tables
• Use SQL
− SELECT syntax; including useful clauses like DISTINCT,
ORDER BY, GROUP BY and more.
− JOIN external data with internal tables
− FUNCTIONS for comparison, math, string, pattern matching,
date/time, etc
− Starting in 9.3 - INSERT / UPDATE / DELETE too
• Predicate pushdown – Filter data on remote sources first!
− SELECT and WHERE clauses today
− Roadmap - Join, Group/Aggregate, Sort and Limit
Some Background on Foreign
Data Wrappers
11. © 2014 EnterpriseDB Corporation. All rights reserved. 11
https://www.github.com/EnterpriseDB
• We have contributed new Mongo FDW, MySQL FDW
and HDFS (Hadoop) FDW
• We focused on adding Enterprise capabilities
− Connecting Pooling
− Predicate pushdowns
− Where clause pushdown
− Column pushdown
− Writeablity (DML support)
EnterpriseDB is investing in Foreign
Data Wrappers
12. © 2014 EnterpriseDB Corporation. All rights reserved. 12
The HDFS_FDW
Postgres Plus
Postgres
HDFS_FDW
15. © 2014 EnterpriseDB Corporation. All rights reserved. 15
Current Release
• Alpha Release – February
2015
• Support for HiveServer 1 & 2
• Complete Select support
• Pushdown features (Where
clause and query columns)
• Connection pooling
• Use of map reduce jobs for
aggregates and complex
query operations.
Roadmap
• Writeability via Hbase
• Support for Flum/Impala server
• More pushdown features (joins,
aggregates, sort etc)
• Custom Map Reduce Jobs
• Authentication support
• Installers to be provided via
EnterpriseDB’s GUI Installer,
RPM and DEB packages
EnterpriseDB’s HDFS_FDW is
Available and Improvements Planned
16. © 2014 EnterpriseDB Corporation. All rights reserved. 16
Postgres Plus
Advanced
Server
Resource
Manager
(CPU & I/O)
Reporting
Transactions
80%
20%
Run Mixed Workloads More Efficiently
with PPAS 9.4 Resource Management
• DBA assigns CPU & I/O to job groups
• Allocates and prioritizes consumption of resources
• Low priority jobs don’t hurt high priority jobs
17. © 2014 EnterpriseDB Corporation. All rights reserved. 17
• Create Resource Groups and assign the CPU Rate Limit
• Use these resource groups during a psql session
Run Mixed Workloads More Efficiently
with PPAS 9.4 Resource Management
- Statements executed will be limited in CPU or writing to shared_buffers per
resource group.
- Individual limits are computed based on recent CPU / shared_buffer usage.
- Processes sleep when necessary & avoid sleep when holding critical locks.
- The limits are adjusted regularly based on current usage.
- If multiple processes in the same group are being executed, aggregate usage
will be limited
CREATE RESOURCE GROUP resgrp_a;
CREATE RESOURCE GROUP resgrp_b;
ALTER RESOURCE GROUP resgrp_a SET cpu_rate_limit = .25;
ALTER RESOURCE GROUP resgrp_a SET dirty_rate_limit = 12288;
SET edb_resource_group TO res_grp_a;
18. © 2014 EnterpriseDB Corporation. All rights reserved. 18
• Conventional path, direct path, and parallel
direct load loading methods
• Data loading from standard input and remote loading,
particularly useful for large data sources on remote
hosts
• Input data with delimiter-separated or fixed-width fields
• Bad file for collecting rejected records
• Discard file for collecting records that do not meet the
selection criteria of any target table
• Log file for recording the EDB*Loader session and any
error messages
Efficiently load data using
EDB*Loader
19. © 2014 EnterpriseDB Corporation. All rights reserved. 19
• Core PostgeSQL Features to Support ‘Big Data’
− Flexible Datatypes – JSON / JSONB and Key Value Store
− Unlogged tables to improve performance
• Foreign Data Wrappers (FDW)
− Use PostgreSQL as a central interface to connect to other
systems to gather data and issue queries or joins
− Push-down for where and columns improve performance
• Postgres Plus features
− Resource Management to more effectively run mixed
workloads
− EDB*Loader to load data from various sources
Recap: EnterpriseDB’s Strategy around
Big Data focuses on 3 Areas
21. © 2014 EnterpriseDB Corporation. All rights reserved. 21
Postgres Plus
Other Other
Postgres
OtherOther