2. Who are we
• Shigeru HANADA
• From Tokyo, Japan
• Working on FDW since 2010
• Implemented initial FDW API and postgres_fdw
• Etsuro Fujita
• From Tokyo, Japan
• Working on Postgres for 10 years
• Interested in FDW enhancements
3. Agenda
• Past enhancements proposed for 9.5
• Inheritance support (Committed)
• Join push-down (Committed)
• Join push-down for postgres_fdw (Returned with feedback)
• Update push-down (Returned with feedback)
• Possible remote query optimization in 9.5
• Ideas for further enhancement
• Sort push-down
• Aggregate push-down
• More aggressive join push-down
• Discussions
5. Inheritance support
• Outline
• Allow foreign table to participate in inheritance tree
• A way to implement sharding
• Example
postgres=# explain verbose select * from parent ;!
QUERY PLAN!
---------------------------------------------------------------------------!
Append (cost=0.00..270.00 rows=2001 width=4)!
-> Seq Scan on public.parent (cost=0.00..0.00 rows=1 width=4)!
Output: parent.a!
-> Foreign Scan on public.ft1 (cost=100.00..135.00 rows=1000 width=4)!
Output: ft1.a!
Remote SQL: SELECT a FROM public.t1!
-> Foreign Scan on public.ft2 (cost=100.00..135.00 rows=1000 width=4)!
Output: ft2.a!
Remote SQL: SELECT a FROM public.t2!
(9 rows)
6. Update push-down
• Outline
• Send whole UPDATE/DELETE statement when it has same
semantics on the remote side
• Example
postgres=# explain verbose update foo set a = a + 1 where a > 10;!
QUERY PLAN!
--------------------------------------------------------------------------------!
Update on public.foo (cost=100.00..139.78 rows=990 width=10)!
Remote SQL: UPDATE public.foo SET a = $2 WHERE ctid = $1!
-> Foreign Scan on public.foo (cost=100.00..139.78 rows=990 width=10)!
Output: (a + 1), ctid!
Remote SQL: SELECT a, ctid FROM public.foo WHERE ((a > 10)) FOR UPDATE!
(5 rows)!
!
postgres=# explain verbose update foo set a = a + 1 where a > 10;!
QUERY PLAN!
-----------------------------------------------------------------------------!
Update on public.foo (cost=100.00..139.78 rows=990 width=10)!
-> Foreign Update on public.foo (cost=100.00..139.78 rows=990 width=10)!
Remote SQL: UPDATE public.foo SET a = (a + 1) WHERE ((a > 10))!
(3 rows)
Current
Patched
7. Update push-down, cont.
• Issues
• FDW-APIs for update push-down
• Called from nodeModifyTable.c or nodeForeignscan.c?
• Update push-down for an update on a join
• "UPDATE foo ... FROM bar ..." (both foo and bar are remote)
• Further enhancements
• INSERT/UPSERT push-down
8. Join push-down
• Outline
• Join foreign tables on remote side, if it’s safe
• Example
fdw=# EXPLAIN (VERBOSE) SELECT tbalance FROM pgbench_branches b JOIN
pgbench_tellers t USING(bid);!
QUERY PLAN!
---------------------------------------------------------------------------
---------------------------------------------------------------------------
---------------------------------------------------------------------------
---------!
Foreign Scan (cost=100.00..101.00 rows=50 width=4)!
Output: t.tbalance!
Relations: (public.pgbench_branches b) INNER JOIN
(public.pgbench_tellers t)!
Remote SQL: SELECT r.a1 FROM (SELECT l.a9 FROM (SELECT bid a9 FROM
public.pgbench_branches) l) l (a1) INNER JOIN (SELECT r.a11, r.a10 FROM
(SELECT bid a10, tbalance a11 FROM public.pgbench_tellers) r) r (a1, a2) ON
((l.a1 = r.a2))!
(4 rows)
9. Join push-down, cont.
• Issues
• Implement postgres_fdw to handle join APIs
• Centralize deparsing remote query
• Should use parse tree rather than planner information to generate join
query?
• Generic SQL deparser would help porting to FDWs for other RDBMS
10. Possible remote query optimization in 9.5
• When we run a following query:
SELECT c.grade, max(s.score) max_score!
FROM scores s LEFT JOIN classes c!
ON c.class_id = s.class_id!
WHERE c.subject = ‘Math’!
GROUP BY c.grade!
HAVING max(s.score) > 50!
ORDER BY c.grade DESC;
“scores” and
“classes” are
foreign tables
11. Possible remote query optimization in 9.5
• When we run a following query:
SELECT c.grade, max(s.score) max_score!
FROM scores s LEFT JOIN classes c!
ON c.class_id = s.class_id!
WHERE c.subject = ‘Math’!
GROUP BY c.grade!
HAVING max(s.score) > 50!
ORDER BY c.grade DESC;
SELECT c.grade, s.score!
FROM scores s LEFT JOIN classes c!
ON c.class_id = s.class_id!
WHERE c.subject= ‘Math’!
ORDER BY c.grade DESC;
Genarate remote query
We can push-down
red portions of the
query
12. Possible remote query optimization in 9.5
postgres=# EXPLAIN SELECT c.grade, max(s.score) max_score!
postgres-# FROM scores s LEFT JOIN classes c!
postgres-# ON c.class_id = s.class_id!
postgres-# WHERE c.subject= 'Math'!
postgres-# GROUP BY c.grade!
postgres-# HAVING max(s.score) > 50!
postgres-# ORDER BY c.grade DESC;!
QUERY PLAN!
----------------------------------------------------------------------------------!
GroupAggregate (cost=27.92..27.94 rows=1 width=8)!
Group Key: c.grade!
Filter: (max(s.score) > 50)!
-> Sort (cost=27.92..27.92 rows=1 width=8)!
Sort Key: c.grade DESC!
-> Hash Join (cost=20.18..27.91 rows=1 width=8)!
Hash Cond: (s.class_id = c.class_id)!
-> Seq Scan on scores s (cost=0.00..6.98 rows=198 width=8)!
-> Hash (cost=20.12..20.12 rows=4 width=8)!
-> Seq Scan on classes c (cost=0.00..20.12 rows=4 width=8)!
Filter: (subject = 'Math'::text)!
(11 rows)
14. Ideas for further enhancement
• Sort push-down
• Aggregate push-down
• More aggressive join push-down
• 2PC support (out of scope of this session)
• Will be discussed in Ashutosh’s session on 19th Jun.
15. Sort push-down
• Outline
• Mark a ForiegnScan as sorted
• Efficacy
• Avoid unnecessary sort on local side
• Use ForeignScan as a source of MergeJoin directly
• How to implement
• Add extra ForeignPath with pathkeys
• Estimate costs of pre-sorted path
• Sort result of a foreign scan
• add ORDER BY, in RDBMS FDWs
• choose pre-sorted file, in file-based FDWs
16. Sort push-down
• Issues
• How can we limit candidates of sort keys?
• No brute-force approach
• Introduce FOREIGN INDEX to represent generic remote indexes?
• Introduce FDW-specific catalogs?
• Extract key information from ORDER BY, JOIN, GROUP BY?
• How can we ensure that the semantics of ordering are identical?
• Even between PostgreSQLs, we have collation issues.
• Is it OK to leave it to DBAs?
• Limiting to non-character data types seems a way to go for the first cut.
• Can we use pre-sorted join results as sorted path?
• MergeJoin as a root node of remote query means the result is sorted by
the join key, but it is not certain even we execute EXPLAIN before
query.
• Any idea?
17. Aggregate push-down
• Outline
• Replace a Aggregate/GroupAggregate/HashAggregate plan node
with a ForeignScan which produces aggregated results
• Efficacy
• Reduce amount of data transferred
• Off-load overheads of aggregation
• How to implement
• New FDW API for aggregation hooking
• Implement API in each FDW
18. Aggregate push-down
• Issues
• GROUP BY requires identical semantics about grouping keys.
• We have similar issue to sort push-down.
• How can we map functions to remote ones?
• ROUTINE MAPPING is defined in SQL standard, but it doesn’t seem
well-designed.
19. More aggressive join push-down
• Outline
• Send local data to join it on remote side, with following way:
• VALUES expression in FROM clause
• per-table replication, with logical replication, Slony-I, etc.
• Efficacy
• Reduce amount of data transferred from remote to local
• Limited to cases that joining small local table and huge remote table
which produce small results
20. More aggressive join push-down
• How to implement
• Replace reference to a small local table with VALUES()
• Use a remote replicated table as an alternative
• Issues
• How can we construct VALUES() expression?
• How can we know a table is replicated on the remote side?
SELECT *!
FROM huge_remote_table h!
JOIN!
(VALUES (1, ‘foo’), (2, ‘bar’)) AS s (id, name)!
ON s.id;
Generated by scanning
local small table