SlideShare a Scribd company logo
1 of 24
Presentation  slide  for  Sqoop  User  Meetup  (Strata  +  Hadoop  World  NYC  2013)  

Complex  stories  about
Sqooping  PostgreSQL  data
10/28/2013
NTT  DATA  Corporation
Masatake  Iwasaki

Copyright © 2013 NTT DATA Corporation
Introduction

Copyright © 2013 NTT DATA Corporation

2
About  Me
Masatake  Iwasaki:
Software  Engineer  @  NTT  DATA:

NTT(Nippon  Telegraph  and  Telephone  Corporation):  Telecommunication
NTT  DATA:  Systems  Integrator

Developed:
Ludia:  Fulltext  search  index  for  PostgreSQL  using  Senna
Authored:
“A  Complete  Primer  for  Hadoop”  
(no  official  English  title)
Patches  for  Sqoop:
SQOOP-‐‑‒390:  PostgreSQL  connector  for  direct  export  with  pg_̲bulkload
SQOOP-‐‑‒999:  Support  bulk  load  from  HDFS  to  PostgreSQL  using  COPY  ...  FROM  
SQOOP-‐‑‒1155:  Sqoop  2  documentation  for  connector  development  

Copyright © 2013 NTT DATA Corporation

3
Why  PostgreSQL?

Enterprisy
from  earlier  version
comparing  to  MySQL
Active  community  in  Japan
NTT  DATA  commits  itself  to  development

Copyright © 2013 NTT DATA Corporation

4
Sqooping  PostgreSQL  data

Copyright © 2013 NTT DATA Corporation

5
Working  around  PostgreSQL
Direct  connector  for  PostgreSQL  loader:
    SQOOP-‐‑‒390:  PostgreSQL  connector  for  direct  export  with  pg_̲bulkload
Yet  another  direct  connector  for  PostgreSQL  JDBC:
    SQOOP-‐‑‒999:  Support  bulk  load  from  HDFS  to  PostgreSQL
                                            using  COPY  ...  FROM
Supporting  complex  data  types:
    SQOOP-‐‑‒1149:  Support  Custom  Postgres  Types
  

Copyright © 2013 NTT DATA Corporation

6
Direct  connector  for  PostgreSQL  loader

SQOOP-‐‑‒390:  
    PostgreSQL  connector  for  direct  export  with  pg_̲bulkload

pg_̲bulkload:
Data  loader  for  PostgreSQL
Server  side  plug-‐‑‒in  library  and  client  side  command
Providing  filtering  and  transformation  of  data
http://pgbulkload.projects.pgfoundry.org/

Copyright © 2013 NTT DATA Corporation

7
SQOOP-‐‑‒390:  
    PostgreSQL  connector  for  direct  export  with  pg_̲bulkload	
HDFS

File  Split

File  Split

File  Split
CREATE  TABLE
    tmp3(LIKE  dest  INCLUDING  CONSTRAINTS)

Mapper

Mapper

Mapper

pg_̲bulkload

pg_̲bulkoad

pg_̲bulkload

tmp1

PostgreSQL

tmp2

Reducer

Destination  Table

Copyright © 2013 NTT DATA Corporation

tmp3

external  process
staging  table  per  mapper  is  must
due  to  table  level  locks
BEGIN
INSERT  INTO  dest  (  SELECT  *  FROM  tmp1  )
DROP  TABLE  tmp1
INSERT  INTO  dest  (  SELECT  *  FROM  tmp2  )
DROP  TABLE  tmp2
INSERT  INTO  dest  (  SELECT  *  FROM  tmp3  )
DROP  TABLE  tmp3
COMMIT
8
Direct  connector  for  PostgreSQL  loader

Pros:
Fast

by  short-‐‑‒circuitting  server  functionality

Flexible

filtering  error  records

Cons:
Not  so  fast

Bottleneck  is  not  in  client  side  but  in  DB  side
Built-‐‑‒in  COPY  functionality  is  fast  enough

Not  General

pg_̲bulkload  supports  only  export

Requiring  setup  on  all  slave  nodes  and  client  node
Possible  to  Require  recovery  on  failure
Copyright © 2013 NTT DATA Corporation

9
Yet  another  direct  connector  for  PostgreSQL  JDBC
PostgreSQL  provides  custom  SQL  command  for  data  import/export
COPY table_name [ ( column_name [, ...] ) ]
FROM { 'filename' | STDIN }
[ [ WITH ] ( option [, ...] ) ]
COPY { table_name [ ( column_name [, ...] ) ] | ( query ) }
TO { 'filename' | STDOUT }
[ [ WITH ] ( option [, ...] ) ]
where option can be one of:
FORMAT format_name
OIDS [ boolean ]
DELIMITER 'delimiter_character'
NULL 'null_string'
HEADER [ boolean ]
QUOTE 'quote_character'
ESCAPE 'escape_character'
FORCE_QUOTE { ( column_name [, ...] ) | * }
FORCE_NOT_NULL ( column_name [, ...] )
ENCODING 'encoding_name‘

AND  JDBC  API
org.postgresql.copy.*
Copyright © 2013 NTT DATA Corporation

10
SQOOP-‐‑‒999:
    Support  bulk  load  from  HDFS  to  PostgreSQL  using  COPY  ...  FROM	

File  Split

File  Split

File  Split

Mapper

Mapper

Mapper

PostgreSQL
JDBC

HDFS

PostgreSQL
JDBC

PostgreSQL
JDBC

Using  custom  SQL  command  
via  JDBC  API
only  available  in  PostgreSQL
COPY  FROM  STDIN  WITH  ...

staging  tagle
PostgreSQL
Destination  Table

Copyright © 2013 NTT DATA Corporation

11
SQOOP-‐‑‒999:
    Support  bulk  load  from  HDFS  to  PostgreSQL  using  COPY  ...  FROM  
import org.postgresql.copy.CopyManager;
import org.postgresql.copy.CopyIn;
...

Requiring  PostgreSQL  
specific  interface.

protected void setup(Context context)
...
dbConf = new DBConfiguration(conf);
CopyManager cm = null;
...
public void map(LongWritable key, Writable value, Context context)
...
if (value instanceof Text) {
line.append(System.getProperty("line.separator"));
}
try {
byte[]data = line.toString().getBytes("UTF-8"); Just  feeding  lines  of  text
copyin.writeToCopy(data, 0, data.length);
Copyright © 2013 NTT DATA Corporation

12
Yet  another  direct  connector  for  PostgreSQL  JDBC

Pros:
Fast  enough
Ease  of  use

JDBC  driver  jar  is  distributed  automatically  by  MR  framework

Cons:
Dependency  on  not  general  JDBC

possible  licensing  issue  (PostgreSQL  is  OK,  itʼ’s  BSD  Lisence)
build  time  requirement  (PostgreSQL  JDBC  is  available  in  Maven  repo.)
<dependency org="org.postgresql" name="postgresql"
rev="${postgresql.version}" conf="common->default" />

Error  record  causes  rollback  of  whole  transaction
Still  difficult  to  implement  custom  connector  for  IMPORT
because  of  code  generation  part

Copyright © 2013 NTT DATA Corporation

13
Supporting  complex  data  types
PostgreSQL  supports  lot  of  complex  data  types  
Geometric  Types
Points
Line  Segments
Boxes
Paths
Polygons
Circles

Network  Address  Types
inet
cidr
macaddr

XML  Type
JSON  Type

Supporting  complex  data  types:
    SQOOP-‐‑‒1149:  Support  Custom  Postgres  Types
Copyright © 2013 NTT DATA Corporation

not  me
14
Constraints  on  JDBC  data  types  in  Sqoop  framework
protected Map<String, Integer> getColumnTypesForRawQuery(String stmt) {
...
results = execute(stmt);
...
ResultSetMetaData metadata = results.getMetaData();
for (int i = 1; i < cols + 1; i++) {
int typeId = metadata.getColumnType(i);

returns  java.sql.Types.OTHER  for  types  
{ not  mappable  to  basic    Java  data  types
=>  Losing  type  imformation    

public String toJavaType(int sqlType)
// Mappings taken from:
// http://java.sun.com/j2se/1.3/docs/guide/jdbc/getstart/mapping.html
if (sqlType == Types.INTEGER) {
return "Integer";
} else if (sqlType == Types.VARCHAR) {
return "String";
....
reaches  here
} else {
// TODO(aaron): Support DISTINCT, ARRAY, STRUCT, REF, JAVA_OBJECT.
// Return null indicating database-specific manager should return a
// java data type if it can find one for any nonstandard type.
return null;
Copyright © 2013 NTT DATA Corporation

15
Sqoop  1  Summary

Pros:
Simple  Standalone  MapReduce  Driver
Easy  to  understand  for  MR  application  developpers

                                                                            except  for  ORM  (SqoopRecord)  code  generation  part.

Variety  of  connectors
Lot  of  information
Cons:
Complex  command  line  and  inconsistent  options
meaning  of  options  is  according  to  connectors

Not  enough  modular
Dependency  on  JDBC  data  model
Security
Copyright © 2013 NTT DATA Corporation

16
Sqooping  PostgreSQL  Data  2

Copyright © 2013 NTT DATA Corporation

17
Sqoop  2
Everything  are  rewritten
Working  on  server  side
More  modular
Not  compatible  with  Sqoop  1  at  all
(Almost)  Only  generic  connector
Black  box  comparing  to  Sqoop  1
Needs  more  documentation

SQOOP-‐‑‒1155:  Sqoop  2  documentation  for  connector  development
Internal of Sqoop2 MapReduce Job
++++++++++++++++++++++++++++++++
...
- OutputFormat invokes Loader's load method (via SqoopOutputFor
.. todo: sequence diagram like figure.
Copyright © 2013 NTT DATA Corporation

18
Sqoop2:  Initialization  phase  of  IMPORT  job
	
 
Implement  this
	
 	
 	
 	
 	
 	
 	
 ,----------------.	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 ,-----------.	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 
	
 	
 	
 	
 	
 	
 	
 |SqoopInputFormat|	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 |Partitioner|	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 
	
 	
 	
 	
 	
 	
 	
 `-------+--------'	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 `-----+-----'	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 
	
 	
 	
 	
 getSplits	
 	
 |	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 |	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 
	
 	
 	
 ----------->|	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 |	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 
	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 |	
 	
 	
 	
 	
 	
 getPartitions	
 	
 	
 	
 	
 	
 |	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 
	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 |------------------------>|	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 
	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 |	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 |	
 	
 	
 	
 	
 	
 	
 	
 	
 ,---------.	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 
	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 |	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 |------->	
 |Partition|	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 
	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 |	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 |	
 	
 	
 	
 	
 	
 	
 	
 	
 `----+----'	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 
	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 |<-	
 -	
 -	
 -	
 -	
 -	
 -	
 -	
 -	
 -	
 -	
 -	
 |	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 |	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 
	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 |	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 |	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 |	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 ,----------.	
 
	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 |-------------------------------------------------->|SqoopSplit|	
 
	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 |	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 |	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 |	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 `----+-----'	
 

Copyright © 2013 NTT DATA Corporation

19
Sqoop2:  Map  phase  of  IMPORT  job
	
 
	
 	
 	
 	
 	
 	
 	
 ,-----------.	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 
	
 	
 	
 	
 	
 	
 	
 |SqoopMapper|	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 
	
 	
 	
 	
 	
 	
 	
 `-----+-----'	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 
	
 	
 	
 	
 	
 	
 run	
 	
 	
 	
 |	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 
	
 	
 	
 --------->|	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 ,-------------.	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 
	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 |---------------------------------->|MapDataWriter|	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 
	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 |	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 `------+------'	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 
Implement  this
	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 |	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 ,---------.	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 |	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 
	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 |-------------->	
 |Extractor|	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 |	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 
	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 |	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 `----+----'	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 |	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 
	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 |	
 	
 	
 	
 	
 	
 extract	
 	
 	
 	
 	
 	
 	
 	
 |	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 |	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 
	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 |-------------------->|	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 |	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 
Conversion  to  Sqoop  
	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 |	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 |	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 |	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 
internal  data  format
	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 read	
 from	
 DB	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 |	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 |	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 
	
 	
 	
 <-------------------------------|	
 	
 	
 	
 	
 	
 write*	
 	
 	
 	
 	
 	
 	
 	
 |	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 
	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 |	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 |------------------->|	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 
	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 |	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 |	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 |	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 ,----.	
 	
 	
 	
 	
 
	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 |	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 |	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 |---------->|Data|	
 	
 	
 	
 	
 
	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 |	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 |	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 |	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 `-+--'	
 	
 	
 	
 	
 
	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 |	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 |	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 |	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 
	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 |	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 |	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 |	
 	
 	
 	
 	
 	
 context.write	
 	
 	
 	
 	
 	
 	
 	
 	
 
	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 |	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 |	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 |-------------------------->	
 
	
 
Copyright © 2013 NTT DATA Corporation

20
Sqoop2:  Reduce  phase  of  EXPORT  job

	
 	
 ,-------.	
 	
 ,---------------------.	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 
	
 	
 |Reducer|	
 	
 |SqoopNullOutputFormat|	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 
	
 	
 `---+---'	
 	
 `----------+----------'	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 
	
 	
 	
 	
 	
 	
 |	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 |	
 	
 	
 ,-----------------------------.	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 
Implement  this
	
 	
 	
 	
 	
 	
 |	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 |->	
 |SqoopOutputFormatLoadExecutor|	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 
	
 	
 	
 	
 	
 	
 |	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 |	
 	
 	
 `--------------+--------------'	
 	
 	
 	
 	
 	
 	
 	
 ,----.	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 
	
 	
 	
 	
 	
 	
 |	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 |	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 |--------------------->	
 |Data|	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 
	
 	
 	
 	
 	
 	
 |	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 |	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 |	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 `-+--'	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 
	
 	
 	
 	
 	
 	
 |	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 |	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 |	
 	
 	
 ,-----------------.	
 	
 	
 |	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 
	
 	
 	
 	
 	
 	
 |	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 |	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 |->	
 |SqoopRecordWriter|	
 	
 	
 |	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 
	
 	
 	
 	
 getRecordWriter	
 	
 	
 	
 	
 |	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 |	
 	
 	
 `--------+--------'	
 	
 	
 |	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 
----------------------->|	
 getRecordWriter	
 	
 |	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 |	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 |	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 
	
 	
 	
 	
 	
 	
 |	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 |----------------->|	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 |	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 |	
 	
 	
 	
 	
 ,--------------.	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 
	
 	
 	
 	
 	
 	
 |	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 |	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 |----------------------------->	
 |ConsumerThread|	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 
	
 	
 	
 	
 	
 	
 |	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 |	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 |	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 |	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 |	
 	
 	
 	
 	
 `------+-------'	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 
	
 	
 	
 	
 	
 	
 |	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 |<-	
 -	
 -	
 -	
 -	
 -	
 -	
 -	
 -|	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 |	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 |	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 |	
 	
 	
 	
 ,------.	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 
<-	
 -	
 -	
 -	
 -	
 -	
 -	
 -	
 -	
 -	
 -	
 -|	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 |	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 |	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 |	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 |--->|Loader|	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 
	
 	
 	
 	
 	
 	
 |	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 |	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 |	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 |	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 |	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 |	
 	
 	
 	
 `--+---'	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 
	
 	
 	
 	
 	
 	
 |	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 |	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 |	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 |	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 |	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 |	
 	
 	
 	
 	
 	
 	
 |	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 
	
 	
 	
 	
 	
 	
 |	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 |	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 |	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 |	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 |	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 |	
 load	
 	
 |	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 
	
 run	
 	
 |	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 |	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 |	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 |	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 |	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 |------>|	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 
----->|	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 |	
 	
 	
 	
 	
 write	
 	
 	
 	
 	
 	
 	
 	
 |	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 |	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 |	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 |	
 	
 	
 	
 	
 	
 	
 |	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 
	
 	
 	
 	
 	
 	
 |------------------------------------------------>|	
 setContent	
 |	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 |	
 read*	
 |	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 
	
 	
 	
 	
 	
 	
 |	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 |	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 |	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 |----------->|	
 getContent	
 |<------|	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 
	
 	
 	
 	
 	
 	
 |	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 |	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 |	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 |	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 |<-----------|	
 	
 	
 	
 	
 	
 	
 |	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 
	
 	
 	
 	
 	
 	
 |	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 |	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 |	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 |	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 |	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 |	
 -	
 -	
 ->|	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 
	
 	
 	
 	
 	
 	
 |	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 |	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 |	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 |	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 |	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 |	
 	
 	
 	
 	
 	
 	
 |	
 write	
 into	
 DB	
 	
 
	
 	
 	
 	
 	
 	
 |	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 |	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 |	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 |	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 |	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 	
 |	
 	
 	
 	
 	
 	
 	
 |-------------->	
 

Copyright © 2013 NTT DATA Corporation

21
Summary

Copyright © 2013 NTT DATA Corporation

22
My  interest  for  popularizing  Sqoop  2

Complex  data  type  support  in  Sqoop  2
Bridge  to  use  Sqoop  1  connectors  on  Sqoop  2
Bridge  to  use  Sqoop  2  connectors  from  Sqoop  1  CLI

Copyright © 2013 NTT DATA Corporation

23
Copyright © 2011 NTT DATA Corporation

Copyright © 2013 NTT DATA Corporation

More Related Content

What's hot

A git workflow for Drupal Core development
A git workflow for Drupal Core developmentA git workflow for Drupal Core development
A git workflow for Drupal Core development
Cameron Tod
 
Athenticated smaba server config with open vpn
Athenticated smaba server  config with open vpnAthenticated smaba server  config with open vpn
Athenticated smaba server config with open vpn
Chanaka Lasantha
 
Ims06 change you can believe in
Ims06 change you can believe inIms06 change you can believe in
Ims06 change you can believe in
Robert Hain
 

What's hot (20)

SQL+GPU+SSD=∞ (English)
SQL+GPU+SSD=∞ (English)SQL+GPU+SSD=∞ (English)
SQL+GPU+SSD=∞ (English)
 
제 8회 엑셈 수요 세미나 자료 연구컨텐츠팀
제 8회 엑셈 수요 세미나 자료 연구컨텐츠팀제 8회 엑셈 수요 세미나 자료 연구컨텐츠팀
제 8회 엑셈 수요 세미나 자료 연구컨텐츠팀
 
A git workflow for Drupal Core development
A git workflow for Drupal Core developmentA git workflow for Drupal Core development
A git workflow for Drupal Core development
 
Apend. networking linux
Apend. networking linuxApend. networking linux
Apend. networking linux
 
In-Network Acceleration with FPGA (MEMO)
In-Network Acceleration with FPGA (MEMO)In-Network Acceleration with FPGA (MEMO)
In-Network Acceleration with FPGA (MEMO)
 
Advanced RAC troubleshooting: Network
Advanced RAC troubleshooting: NetworkAdvanced RAC troubleshooting: Network
Advanced RAC troubleshooting: Network
 
A kind and gentle introducton to rac
A kind and gentle introducton to racA kind and gentle introducton to rac
A kind and gentle introducton to rac
 
Athenticated smaba server config with open vpn
Athenticated smaba server  config with open vpnAthenticated smaba server  config with open vpn
Athenticated smaba server config with open vpn
 
Mvcc in postgreSQL 권건우
Mvcc in postgreSQL 권건우Mvcc in postgreSQL 권건우
Mvcc in postgreSQL 권건우
 
Introduction to Parallel Execution
Introduction to Parallel ExecutionIntroduction to Parallel Execution
Introduction to Parallel Execution
 
PG-Strom - GPGPU meets PostgreSQL, PGcon2015
PG-Strom - GPGPU meets PostgreSQL, PGcon2015PG-Strom - GPGPU meets PostgreSQL, PGcon2015
PG-Strom - GPGPU meets PostgreSQL, PGcon2015
 
Cisco vs. huawei CLI Commands
Cisco vs. huawei CLI CommandsCisco vs. huawei CLI Commands
Cisco vs. huawei CLI Commands
 
Ims06 change you can believe in
Ims06 change you can believe inIms06 change you can believe in
Ims06 change you can believe in
 
Replicate Oracle to Oracle, Oracle to MySQL, and Oracle to Analytics
Replicate Oracle to Oracle, Oracle to MySQL, and Oracle to AnalyticsReplicate Oracle to Oracle, Oracle to MySQL, and Oracle to Analytics
Replicate Oracle to Oracle, Oracle to MySQL, and Oracle to Analytics
 
OpenWorld Sep14 12c for_developers
OpenWorld Sep14 12c for_developersOpenWorld Sep14 12c for_developers
OpenWorld Sep14 12c for_developers
 
Ccnp3 lab 3_3_en
Ccnp3 lab 3_3_enCcnp3 lab 3_3_en
Ccnp3 lab 3_3_en
 
Amazon aurora 1
Amazon aurora 1Amazon aurora 1
Amazon aurora 1
 
Memcache as udp traffic reflector
Memcache as udp traffic reflectorMemcache as udp traffic reflector
Memcache as udp traffic reflector
 
Ccnp3 lab 3_4_en
Ccnp3 lab 3_4_enCcnp3 lab 3_4_en
Ccnp3 lab 3_4_en
 
OakTable World Sep14 clonedb
OakTable World Sep14 clonedb OakTable World Sep14 clonedb
OakTable World Sep14 clonedb
 

Viewers also liked

Строим собственную BI в MS Excel на данных из Яндекс.Метрики и Google Analitycs
Строим собственную BI в MS Excel на данных из Яндекс.Метрики и Google AnalitycsСтроим собственную BI в MS Excel на данных из Яндекс.Метрики и Google Analitycs
Строим собственную BI в MS Excel на данных из Яндекс.Метрики и Google Analitycs
Maxim Uvarov
 
Reworker предпочтовая подготовка
Reworker предпочтовая подготовкаReworker предпочтовая подготовка
Reworker предпочтовая подготовка
Nikita Florinskiy
 
PwC Sports Outlook 2016
PwC Sports Outlook 2016PwC Sports Outlook 2016
PwC Sports Outlook 2016
Jonesy033
 

Viewers also liked (20)

Retail crm
Retail crmRetail crm
Retail crm
 
Intellectual property in social media and online resources. Dmytro Gadomsky
Intellectual property in social media and online resources. Dmytro GadomskyIntellectual property in social media and online resources. Dmytro Gadomsky
Intellectual property in social media and online resources. Dmytro Gadomsky
 
14 albrieu&amp;baruzzi 2016 comparación normasdnv1967 2010
14 albrieu&amp;baruzzi 2016 comparación normasdnv1967 201014 albrieu&amp;baruzzi 2016 comparación normasdnv1967 2010
14 albrieu&amp;baruzzi 2016 comparación normasdnv1967 2010
 
VidaEconomica_21NOV
VidaEconomica_21NOVVidaEconomica_21NOV
VidaEconomica_21NOV
 
Fine motor
Fine motorFine motor
Fine motor
 
Fine Motor 2
Fine Motor 2Fine Motor 2
Fine Motor 2
 
Israel dop curvas horizontales eua&amp;europa r
Israel dop curvas horizontales eua&amp;europa rIsrael dop curvas horizontales eua&amp;europa r
Israel dop curvas horizontales eua&amp;europa r
 
Cryptocurrencies for Everyone (Dmytro Pershyn Technology Stream)
Cryptocurrencies for Everyone (Dmytro Pershyn Technology Stream)Cryptocurrencies for Everyone (Dmytro Pershyn Technology Stream)
Cryptocurrencies for Everyone (Dmytro Pershyn Technology Stream)
 
D drops Vitamin D za stare i mlade
D drops Vitamin D za stare i mladeD drops Vitamin D za stare i mlade
D drops Vitamin D za stare i mlade
 
What is it to be a senior engineer?
What is it to be a senior engineer?What is it to be a senior engineer?
What is it to be a senior engineer?
 
До Дня Незалежності України
До Дня Незалежності УкраїниДо Дня Незалежності України
До Дня Незалежності України
 
Захист прав інтелектуальної власності. Носік Ю.В.
Захист прав інтелектуальної власності. Носік Ю.В.Захист прав інтелектуальної власності. Носік Ю.В.
Захист прав інтелектуальної власності. Носік Ю.В.
 
E05912226
E05912226E05912226
E05912226
 
How to Create Digital Disruption (Alejandro Danylyszyn Business Stream)
How to Create Digital Disruption (Alejandro Danylyszyn Business Stream)How to Create Digital Disruption (Alejandro Danylyszyn Business Stream)
How to Create Digital Disruption (Alejandro Danylyszyn Business Stream)
 
Строим собственную BI в MS Excel на данных из Яндекс.Метрики и Google Analitycs
Строим собственную BI в MS Excel на данных из Яндекс.Метрики и Google AnalitycsСтроим собственную BI в MS Excel на данных из Яндекс.Метрики и Google Analitycs
Строим собственную BI в MS Excel на данных из Яндекс.Метрики и Google Analitycs
 
Reworker предпочтовая подготовка
Reworker предпочтовая подготовкаReworker предпочтовая подготовка
Reworker предпочтовая подготовка
 
PwC Sports Outlook 2016
PwC Sports Outlook 2016PwC Sports Outlook 2016
PwC Sports Outlook 2016
 
Failure Analysis of Gas Turbine Blade
Failure Analysis of Gas Turbine BladeFailure Analysis of Gas Turbine Blade
Failure Analysis of Gas Turbine Blade
 
EBD CPAD Lições bíblicas 1° trimestre 2016 lição 3 Esperando a volta de Jesus.
EBD CPAD Lições bíblicas 1° trimestre 2016 lição 3 Esperando a volta de Jesus.EBD CPAD Lições bíblicas 1° trimestre 2016 lição 3 Esperando a volta de Jesus.
EBD CPAD Lições bíblicas 1° trimestre 2016 lição 3 Esperando a volta de Jesus.
 
Сучасні форми фінансування комерціалізації інтелектуальної власності. Володим...
Сучасні форми фінансування комерціалізації інтелектуальної власності. Володим...Сучасні форми фінансування комерціалізації інтелектуальної власності. Володим...
Сучасні форми фінансування комерціалізації інтелектуальної власності. Володим...
 

Similar to Complex stories about Sqooping PostgreSQL data

Common schema my sql uc 2012
Common schema   my sql uc 2012Common schema   my sql uc 2012
Common schema my sql uc 2012
Roland Bouman
 
Common schema my sql uc 2012
Common schema   my sql uc 2012Common schema   my sql uc 2012
Common schema my sql uc 2012
Roland Bouman
 
Oracle Basics and Architecture
Oracle Basics and ArchitectureOracle Basics and Architecture
Oracle Basics and Architecture
Sidney Chen
 

Similar to Complex stories about Sqooping PostgreSQL data (20)

PostgreSQL 13 New Features
PostgreSQL 13 New FeaturesPostgreSQL 13 New Features
PostgreSQL 13 New Features
 
Compute 101 - OpenStack Summit Vancouver 2015
Compute 101 - OpenStack Summit Vancouver 2015Compute 101 - OpenStack Summit Vancouver 2015
Compute 101 - OpenStack Summit Vancouver 2015
 
MySQLinsanity
MySQLinsanityMySQLinsanity
MySQLinsanity
 
Couch to OpenStack: Glance - July, 23, 2013
Couch to OpenStack: Glance - July, 23, 2013Couch to OpenStack: Glance - July, 23, 2013
Couch to OpenStack: Glance - July, 23, 2013
 
GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~
GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~
GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~
 
20160407_GTC2016_PgSQL_In_Place
20160407_GTC2016_PgSQL_In_Place20160407_GTC2016_PgSQL_In_Place
20160407_GTC2016_PgSQL_In_Place
 
pgconfasia2016 plcuda en
pgconfasia2016 plcuda enpgconfasia2016 plcuda en
pgconfasia2016 plcuda en
 
TrinityCore server install guide
TrinityCore server install guideTrinityCore server install guide
TrinityCore server install guide
 
Common schema my sql uc 2012
Common schema   my sql uc 2012Common schema   my sql uc 2012
Common schema my sql uc 2012
 
Common schema my sql uc 2012
Common schema   my sql uc 2012Common schema   my sql uc 2012
Common schema my sql uc 2012
 
How to Avoid Pitfalls in Schema Upgrade with Galera
How to Avoid Pitfalls in Schema Upgrade with GaleraHow to Avoid Pitfalls in Schema Upgrade with Galera
How to Avoid Pitfalls in Schema Upgrade with Galera
 
Pro PostgreSQL
Pro PostgreSQLPro PostgreSQL
Pro PostgreSQL
 
Oracle Basics and Architecture
Oracle Basics and ArchitectureOracle Basics and Architecture
Oracle Basics and Architecture
 
Curso de MySQL 5.7
Curso de MySQL 5.7Curso de MySQL 5.7
Curso de MySQL 5.7
 
[Pgday.Seoul 2019] Citus를 이용한 분산 데이터베이스
[Pgday.Seoul 2019] Citus를 이용한 분산 데이터베이스[Pgday.Seoul 2019] Citus를 이용한 분산 데이터베이스
[Pgday.Seoul 2019] Citus를 이용한 분산 데이터베이스
 
NetDevOps 202: Life After Configuration
NetDevOps 202: Life After ConfigurationNetDevOps 202: Life After Configuration
NetDevOps 202: Life After Configuration
 
20181016_pgconfeu_ssd2gpu_multi
20181016_pgconfeu_ssd2gpu_multi20181016_pgconfeu_ssd2gpu_multi
20181016_pgconfeu_ssd2gpu_multi
 
Stress your DUT
Stress your DUTStress your DUT
Stress your DUT
 
PLNOG20 - Paweł Małachowski - Stress your DUT–wykorzystanie narzędzi open sou...
PLNOG20 - Paweł Małachowski - Stress your DUT–wykorzystanie narzędzi open sou...PLNOG20 - Paweł Małachowski - Stress your DUT–wykorzystanie narzędzi open sou...
PLNOG20 - Paweł Małachowski - Stress your DUT–wykorzystanie narzędzi open sou...
 
Training Slides: Advanced 304: Upgrading From Native MySQL Replication To Tun...
Training Slides: Advanced 304: Upgrading From Native MySQL Replication To Tun...Training Slides: Advanced 304: Upgrading From Native MySQL Replication To Tun...
Training Slides: Advanced 304: Upgrading From Native MySQL Replication To Tun...
 

More from NTT DATA OSS Professional Services

More from NTT DATA OSS Professional Services (20)

Global Top 5 を目指す NTT DATA の確かで意外な技術力
Global Top 5 を目指す NTT DATA の確かで意外な技術力Global Top 5 を目指す NTT DATA の確かで意外な技術力
Global Top 5 を目指す NTT DATA の確かで意外な技術力
 
Spark SQL - The internal -
Spark SQL - The internal -Spark SQL - The internal -
Spark SQL - The internal -
 
Apache Kafkaって本当に大丈夫?~故障検証のオーバービューと興味深い挙動の紹介~
Apache Kafkaって本当に大丈夫?~故障検証のオーバービューと興味深い挙動の紹介~Apache Kafkaって本当に大丈夫?~故障検証のオーバービューと興味深い挙動の紹介~
Apache Kafkaって本当に大丈夫?~故障検証のオーバービューと興味深い挙動の紹介~
 
Hadoopエコシステムのデータストア振り返り
Hadoopエコシステムのデータストア振り返りHadoopエコシステムのデータストア振り返り
Hadoopエコシステムのデータストア振り返り
 
HDFS Router-based federation
HDFS Router-based federationHDFS Router-based federation
HDFS Router-based federation
 
PostgreSQL10を導入!大規模データ分析事例からみるDWHとしてのPostgreSQL活用のポイント
PostgreSQL10を導入!大規模データ分析事例からみるDWHとしてのPostgreSQL活用のポイントPostgreSQL10を導入!大規模データ分析事例からみるDWHとしてのPostgreSQL活用のポイント
PostgreSQL10を導入!大規模データ分析事例からみるDWHとしてのPostgreSQL活用のポイント
 
Apache Hadoopの新機能Ozoneの現状
Apache Hadoopの新機能Ozoneの現状Apache Hadoopの新機能Ozoneの現状
Apache Hadoopの新機能Ozoneの現状
 
Distributed data stores in Hadoop ecosystem
Distributed data stores in Hadoop ecosystemDistributed data stores in Hadoop ecosystem
Distributed data stores in Hadoop ecosystem
 
Structured Streaming - The Internal -
Structured Streaming - The Internal -Structured Streaming - The Internal -
Structured Streaming - The Internal -
 
Apache Hadoopの未来 3系になって何が変わるのか?
Apache Hadoopの未来 3系になって何が変わるのか?Apache Hadoopの未来 3系になって何が変わるのか?
Apache Hadoopの未来 3系になって何が変わるのか?
 
Apache Hadoop and YARN, current development status
Apache Hadoop and YARN, current development statusApache Hadoop and YARN, current development status
Apache Hadoop and YARN, current development status
 
HDFS basics from API perspective
HDFS basics from API perspectiveHDFS basics from API perspective
HDFS basics from API perspective
 
SIerとオープンソースの美味しい関係 ~コミュニティの力を活かして世界を目指そう~
SIerとオープンソースの美味しい関係 ~コミュニティの力を活かして世界を目指そう~SIerとオープンソースの美味しい関係 ~コミュニティの力を活かして世界を目指そう~
SIerとオープンソースの美味しい関係 ~コミュニティの力を活かして世界を目指そう~
 
20170303 java9 hadoop
20170303 java9 hadoop20170303 java9 hadoop
20170303 java9 hadoop
 
ブロックチェーンの仕組みと動向(入門編)
ブロックチェーンの仕組みと動向(入門編)ブロックチェーンの仕組みと動向(入門編)
ブロックチェーンの仕組みと動向(入門編)
 
Application of postgre sql to large social infrastructure jp
Application of postgre sql to large social infrastructure jpApplication of postgre sql to large social infrastructure jp
Application of postgre sql to large social infrastructure jp
 
Application of postgre sql to large social infrastructure
Application of postgre sql to large social infrastructureApplication of postgre sql to large social infrastructure
Application of postgre sql to large social infrastructure
 
Apache Hadoop 2.8.0 の新機能 (抜粋)
Apache Hadoop 2.8.0 の新機能 (抜粋)Apache Hadoop 2.8.0 の新機能 (抜粋)
Apache Hadoop 2.8.0 の新機能 (抜粋)
 
データ活用をもっともっと円滑に! ~データ処理・分析基盤編を少しだけ~
データ活用をもっともっと円滑に!~データ処理・分析基盤編を少しだけ~データ活用をもっともっと円滑に!~データ処理・分析基盤編を少しだけ~
データ活用をもっともっと円滑に! ~データ処理・分析基盤編を少しだけ~
 
商用ミドルウェアのPuppet化で気を付けたい5つのこと
商用ミドルウェアのPuppet化で気を付けたい5つのこと商用ミドルウェアのPuppet化で気を付けたい5つのこと
商用ミドルウェアのPuppet化で気を付けたい5つのこと
 

Recently uploaded

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 

Complex stories about Sqooping PostgreSQL data

  • 1. Presentation  slide  for  Sqoop  User  Meetup  (Strata  +  Hadoop  World  NYC  2013)   Complex  stories  about Sqooping  PostgreSQL  data 10/28/2013 NTT  DATA  Corporation Masatake  Iwasaki Copyright © 2013 NTT DATA Corporation
  • 2. Introduction Copyright © 2013 NTT DATA Corporation 2
  • 3. About  Me Masatake  Iwasaki: Software  Engineer  @  NTT  DATA: NTT(Nippon  Telegraph  and  Telephone  Corporation):  Telecommunication NTT  DATA:  Systems  Integrator Developed: Ludia:  Fulltext  search  index  for  PostgreSQL  using  Senna Authored: “A  Complete  Primer  for  Hadoop”   (no  official  English  title) Patches  for  Sqoop: SQOOP-‐‑‒390:  PostgreSQL  connector  for  direct  export  with  pg_̲bulkload SQOOP-‐‑‒999:  Support  bulk  load  from  HDFS  to  PostgreSQL  using  COPY  ...  FROM   SQOOP-‐‑‒1155:  Sqoop  2  documentation  for  connector  development   Copyright © 2013 NTT DATA Corporation 3
  • 4. Why  PostgreSQL? Enterprisy from  earlier  version comparing  to  MySQL Active  community  in  Japan NTT  DATA  commits  itself  to  development Copyright © 2013 NTT DATA Corporation 4
  • 5. Sqooping  PostgreSQL  data Copyright © 2013 NTT DATA Corporation 5
  • 6. Working  around  PostgreSQL Direct  connector  for  PostgreSQL  loader:    SQOOP-‐‑‒390:  PostgreSQL  connector  for  direct  export  with  pg_̲bulkload Yet  another  direct  connector  for  PostgreSQL  JDBC:    SQOOP-‐‑‒999:  Support  bulk  load  from  HDFS  to  PostgreSQL                                            using  COPY  ...  FROM Supporting  complex  data  types:    SQOOP-‐‑‒1149:  Support  Custom  Postgres  Types   Copyright © 2013 NTT DATA Corporation 6
  • 7. Direct  connector  for  PostgreSQL  loader SQOOP-‐‑‒390:      PostgreSQL  connector  for  direct  export  with  pg_̲bulkload pg_̲bulkload: Data  loader  for  PostgreSQL Server  side  plug-‐‑‒in  library  and  client  side  command Providing  filtering  and  transformation  of  data http://pgbulkload.projects.pgfoundry.org/ Copyright © 2013 NTT DATA Corporation 7
  • 8. SQOOP-‐‑‒390:      PostgreSQL  connector  for  direct  export  with  pg_̲bulkload HDFS File  Split File  Split File  Split CREATE  TABLE    tmp3(LIKE  dest  INCLUDING  CONSTRAINTS) Mapper Mapper Mapper pg_̲bulkload pg_̲bulkoad pg_̲bulkload tmp1 PostgreSQL tmp2 Reducer Destination  Table Copyright © 2013 NTT DATA Corporation tmp3 external  process staging  table  per  mapper  is  must due  to  table  level  locks BEGIN INSERT  INTO  dest  (  SELECT  *  FROM  tmp1  ) DROP  TABLE  tmp1 INSERT  INTO  dest  (  SELECT  *  FROM  tmp2  ) DROP  TABLE  tmp2 INSERT  INTO  dest  (  SELECT  *  FROM  tmp3  ) DROP  TABLE  tmp3 COMMIT 8
  • 9. Direct  connector  for  PostgreSQL  loader Pros: Fast by  short-‐‑‒circuitting  server  functionality Flexible filtering  error  records Cons: Not  so  fast Bottleneck  is  not  in  client  side  but  in  DB  side Built-‐‑‒in  COPY  functionality  is  fast  enough Not  General pg_̲bulkload  supports  only  export Requiring  setup  on  all  slave  nodes  and  client  node Possible  to  Require  recovery  on  failure Copyright © 2013 NTT DATA Corporation 9
  • 10. Yet  another  direct  connector  for  PostgreSQL  JDBC PostgreSQL  provides  custom  SQL  command  for  data  import/export COPY table_name [ ( column_name [, ...] ) ] FROM { 'filename' | STDIN } [ [ WITH ] ( option [, ...] ) ] COPY { table_name [ ( column_name [, ...] ) ] | ( query ) } TO { 'filename' | STDOUT } [ [ WITH ] ( option [, ...] ) ] where option can be one of: FORMAT format_name OIDS [ boolean ] DELIMITER 'delimiter_character' NULL 'null_string' HEADER [ boolean ] QUOTE 'quote_character' ESCAPE 'escape_character' FORCE_QUOTE { ( column_name [, ...] ) | * } FORCE_NOT_NULL ( column_name [, ...] ) ENCODING 'encoding_name‘ AND  JDBC  API org.postgresql.copy.* Copyright © 2013 NTT DATA Corporation 10
  • 11. SQOOP-‐‑‒999:    Support  bulk  load  from  HDFS  to  PostgreSQL  using  COPY  ...  FROM File  Split File  Split File  Split Mapper Mapper Mapper PostgreSQL JDBC HDFS PostgreSQL JDBC PostgreSQL JDBC Using  custom  SQL  command   via  JDBC  API only  available  in  PostgreSQL COPY  FROM  STDIN  WITH  ... staging  tagle PostgreSQL Destination  Table Copyright © 2013 NTT DATA Corporation 11
  • 12. SQOOP-‐‑‒999:    Support  bulk  load  from  HDFS  to  PostgreSQL  using  COPY  ...  FROM   import org.postgresql.copy.CopyManager; import org.postgresql.copy.CopyIn; ... Requiring  PostgreSQL   specific  interface. protected void setup(Context context) ... dbConf = new DBConfiguration(conf); CopyManager cm = null; ... public void map(LongWritable key, Writable value, Context context) ... if (value instanceof Text) { line.append(System.getProperty("line.separator")); } try { byte[]data = line.toString().getBytes("UTF-8"); Just  feeding  lines  of  text copyin.writeToCopy(data, 0, data.length); Copyright © 2013 NTT DATA Corporation 12
  • 13. Yet  another  direct  connector  for  PostgreSQL  JDBC Pros: Fast  enough Ease  of  use JDBC  driver  jar  is  distributed  automatically  by  MR  framework Cons: Dependency  on  not  general  JDBC possible  licensing  issue  (PostgreSQL  is  OK,  itʼ’s  BSD  Lisence) build  time  requirement  (PostgreSQL  JDBC  is  available  in  Maven  repo.) <dependency org="org.postgresql" name="postgresql" rev="${postgresql.version}" conf="common->default" /> Error  record  causes  rollback  of  whole  transaction Still  difficult  to  implement  custom  connector  for  IMPORT because  of  code  generation  part Copyright © 2013 NTT DATA Corporation 13
  • 14. Supporting  complex  data  types PostgreSQL  supports  lot  of  complex  data  types   Geometric  Types Points Line  Segments Boxes Paths Polygons Circles Network  Address  Types inet cidr macaddr XML  Type JSON  Type Supporting  complex  data  types:    SQOOP-‐‑‒1149:  Support  Custom  Postgres  Types Copyright © 2013 NTT DATA Corporation not  me 14
  • 15. Constraints  on  JDBC  data  types  in  Sqoop  framework protected Map<String, Integer> getColumnTypesForRawQuery(String stmt) { ... results = execute(stmt); ... ResultSetMetaData metadata = results.getMetaData(); for (int i = 1; i < cols + 1; i++) { int typeId = metadata.getColumnType(i); returns  java.sql.Types.OTHER  for  types   { not  mappable  to  basic    Java  data  types =>  Losing  type  imformation     public String toJavaType(int sqlType) // Mappings taken from: // http://java.sun.com/j2se/1.3/docs/guide/jdbc/getstart/mapping.html if (sqlType == Types.INTEGER) { return "Integer"; } else if (sqlType == Types.VARCHAR) { return "String"; .... reaches  here } else { // TODO(aaron): Support DISTINCT, ARRAY, STRUCT, REF, JAVA_OBJECT. // Return null indicating database-specific manager should return a // java data type if it can find one for any nonstandard type. return null; Copyright © 2013 NTT DATA Corporation 15
  • 16. Sqoop  1  Summary Pros: Simple  Standalone  MapReduce  Driver Easy  to  understand  for  MR  application  developpers                                                                            except  for  ORM  (SqoopRecord)  code  generation  part. Variety  of  connectors Lot  of  information Cons: Complex  command  line  and  inconsistent  options meaning  of  options  is  according  to  connectors Not  enough  modular Dependency  on  JDBC  data  model Security Copyright © 2013 NTT DATA Corporation 16
  • 17. Sqooping  PostgreSQL  Data  2 Copyright © 2013 NTT DATA Corporation 17
  • 18. Sqoop  2 Everything  are  rewritten Working  on  server  side More  modular Not  compatible  with  Sqoop  1  at  all (Almost)  Only  generic  connector Black  box  comparing  to  Sqoop  1 Needs  more  documentation SQOOP-‐‑‒1155:  Sqoop  2  documentation  for  connector  development Internal of Sqoop2 MapReduce Job ++++++++++++++++++++++++++++++++ ... - OutputFormat invokes Loader's load method (via SqoopOutputFor .. todo: sequence diagram like figure. Copyright © 2013 NTT DATA Corporation 18
  • 19. Sqoop2:  Initialization  phase  of  IMPORT  job Implement  this ,----------------. ,-----------. |SqoopInputFormat| |Partitioner| `-------+--------' `-----+-----' getSplits | | ----------->| | | getPartitions | |------------------------>| | | ,---------. | |-------> |Partition| | | `----+----' |<- - - - - - - - - - - - | | | | | ,----------. |-------------------------------------------------->|SqoopSplit| | | | `----+-----' Copyright © 2013 NTT DATA Corporation 19
  • 20. Sqoop2:  Map  phase  of  IMPORT  job ,-----------. |SqoopMapper| `-----+-----' run | --------->| ,-------------. |---------------------------------->|MapDataWriter| | `------+------' Implement  this | ,---------. | |--------------> |Extractor| | | `----+----' | | extract | | |-------------------->| | Conversion  to  Sqoop   | | | internal  data  format read from DB | | <-------------------------------| write* | | |------------------->| | | | ,----. | | |---------->|Data| | | | `-+--' | | | | | | context.write | | |--------------------------> Copyright © 2013 NTT DATA Corporation 20
  • 21. Sqoop2:  Reduce  phase  of  EXPORT  job ,-------. ,---------------------. |Reducer| |SqoopNullOutputFormat| `---+---' `----------+----------' | | ,-----------------------------. Implement  this | |-> |SqoopOutputFormatLoadExecutor| | | `--------------+--------------' ,----. | | |---------------------> |Data| | | | `-+--' | | | ,-----------------. | | | |-> |SqoopRecordWriter| | getRecordWriter | | `--------+--------' | ----------------------->| getRecordWriter | | | | |----------------->| | | ,--------------. | | |-----------------------------> |ConsumerThread| | | | | | `------+-------' | |<- - - - - - - - -| | | | ,------. <- - - - - - - - - - - -| | | | |--->|Loader| | | | | | | `--+---' | | | | | | | | | | | | | load | run | | | | | |------>| ----->| | write | | | | | |------------------------------------------------>| setContent | | read* | | | | |----------->| getContent |<------| | | | | |<-----------| | | | | | | | - - ->| | | | | | | | write into DB | | | | | | |--------------> Copyright © 2013 NTT DATA Corporation 21
  • 22. Summary Copyright © 2013 NTT DATA Corporation 22
  • 23. My  interest  for  popularizing  Sqoop  2 Complex  data  type  support  in  Sqoop  2 Bridge  to  use  Sqoop  1  connectors  on  Sqoop  2 Bridge  to  use  Sqoop  2  connectors  from  Sqoop  1  CLI Copyright © 2013 NTT DATA Corporation 23
  • 24. Copyright © 2011 NTT DATA Corporation Copyright © 2013 NTT DATA Corporation