SlideShare ist ein Scribd-Unternehmen logo
1 von 64
Downloaden Sie, um offline zu lesen
Outline
                       Introduction
                       Architecture
    Configuration examples & Usage
           Current status & TODO




PgLoader, the parallel ETL for PostgreSQL

                         Dimitri Fontaine


                        October 17, 2008




                   Dimitri Fontaine   PgLoader, the parallel ETL for PostgreSQL
Outline
                                  Introduction
                                  Architecture
               Configuration examples & Usage
                      Current status & TODO


Table of contents


  1   Introduction
         pgloader, the what?

  2   Architecture
        Main components
        Parallel Organisation

  3   Configuration examples & Usage

  4   Current status & TODO



                              Dimitri Fontaine   PgLoader, the parallel ETL for PostgreSQL
Outline
                                 Introduction
                                 Architecture   pgloader, the what?
              Configuration examples & Usage
                     Current status & TODO


ETL



 Definition
 An ETL process data to load into the database from a flat file.

   1   Extract
   2   Transform
   3   Load




                             Dimitri Fontaine   PgLoader, the parallel ETL for PostgreSQL
Outline
                                Introduction
                                Architecture   pgloader, the what?
             Configuration examples & Usage
                    Current status & TODO


pgloader’s features



  PGLoader will:
      Load CSV data




                            Dimitri Fontaine   PgLoader, the parallel ETL for PostgreSQL
Outline
                                Introduction
                                Architecture   pgloader, the what?
             Configuration examples & Usage
                    Current status & TODO


pgloader’s features



  PGLoader will:
      Load CSV data
      Load pretend-to-be CSV data




                            Dimitri Fontaine   PgLoader, the parallel ETL for PostgreSQL
Outline
                                Introduction
                                Architecture   pgloader, the what?
             Configuration examples & Usage
                    Current status & TODO


pgloader’s features



  PGLoader will:
      Load CSV data
      Load pretend-to-be CSV data
      Continue loading when confronted to errors




                            Dimitri Fontaine   PgLoader, the parallel ETL for PostgreSQL
Outline
                                Introduction
                                Architecture   pgloader, the what?
             Configuration examples & Usage
                    Current status & TODO


pgloader’s features



  PGLoader will:
      Load CSV data
      Load pretend-to-be CSV data
      Continue loading when confronted to errors
      Apply user define transformation to data, on the fly




                            Dimitri Fontaine   PgLoader, the parallel ETL for PostgreSQL
Outline
                                Introduction
                                Architecture   pgloader, the what?
             Configuration examples & Usage
                    Current status & TODO


pgloader’s features



  PGLoader will:
      Load CSV data
      Load pretend-to-be CSV data
      Continue loading when confronted to errors
      Apply user define transformation to data, on the fly
      Optionaly have all your cores participate into processing




                            Dimitri Fontaine   PgLoader, the parallel ETL for PostgreSQL
Outline
                                Introduction
                                               Main components
                                Architecture
                                               Parallel Organisation
             Configuration examples & Usage
                    Current status & TODO


Configuration


  We first parse the configuration, with templating system
  Example
  [simple]
  use_template     =   simple_tmpl
  table            =   simple
  filename         =   simple/simple.data
  columns          =   a:1, b:3, c:2




                            Dimitri Fontaine   PgLoader, the parallel ETL for PostgreSQL
Outline
                                Introduction
                                               Main components
                                Architecture
                                               Parallel Organisation
             Configuration examples & Usage
                    Current status & TODO


Loading: file reading


  PGLoader supports many input formats, even if they all look like
  CSV, the rough time is parsing data:
      Read files one line at a time




                            Dimitri Fontaine   PgLoader, the parallel ETL for PostgreSQL
Outline
                                Introduction
                                               Main components
                                Architecture
                                               Parallel Organisation
             Configuration examples & Usage
                    Current status & TODO


Loading: file reading


  PGLoader supports many input formats, even if they all look like
  CSV, the rough time is parsing data:
      Read files one line at a time
      Parse physical lines into logical lines




                            Dimitri Fontaine   PgLoader, the parallel ETL for PostgreSQL
Outline
                                Introduction
                                               Main components
                                Architecture
                                               Parallel Organisation
             Configuration examples & Usage
                    Current status & TODO


Loading: file reading


  PGLoader supports many input formats, even if they all look like
  CSV, the rough time is parsing data:
      Read files one line at a time
      Parse physical lines into logical lines
      Supports several readers

           csvreader




                            Dimitri Fontaine   PgLoader, the parallel ETL for PostgreSQL
Outline
                                Introduction
                                               Main components
                                Architecture
                                               Parallel Organisation
             Configuration examples & Usage
                    Current status & TODO


Loading: file reading


  PGLoader supports many input formats, even if they all look like
  CSV, the rough time is parsing data:
      Read files one line at a time
      Parse physical lines into logical lines
      Supports several readers
           textreader
           csvreader




                            Dimitri Fontaine   PgLoader, the parallel ETL for PostgreSQL
Outline
                                Introduction
                                               Main components
                                Architecture
                                               Parallel Organisation
             Configuration examples & Usage
                    Current status & TODO


Loading: file reading


  PGLoader supports many input formats, even if they all look like
  CSV, the rough time is parsing data:
      Read files one line at a time
      Parse physical lines into logical lines
      Supports several readers
           textreader
           csvreader
           fixedreader




                            Dimitri Fontaine   PgLoader, the parallel ETL for PostgreSQL
Outline
                                Introduction
                                               Main components
                                Architecture
                                               Parallel Organisation
             Configuration examples & Usage
                    Current status & TODO


Processing lines



  Parsing data is the CPU intensive part of the job. You could even
  have to guess where lines begin and end.




                            Dimitri Fontaine   PgLoader, the parallel ETL for PostgreSQL
Outline
                                Introduction
                                               Main components
                                Architecture
                                               Parallel Organisation
             Configuration examples & Usage
                    Current status & TODO


Processing lines



  Parsing data is the CPU intensive part of the job. You could even
  have to guess where lines begin and end. Then you add:
      columns restrictions




                            Dimitri Fontaine   PgLoader, the parallel ETL for PostgreSQL
Outline
                                Introduction
                                               Main components
                                Architecture
                                               Parallel Organisation
             Configuration examples & Usage
                    Current status & TODO


Processing lines



  Parsing data is the CPU intensive part of the job. You could even
  have to guess where lines begin and end. Then you add:
      columns restrictions
      columns reordering




                            Dimitri Fontaine   PgLoader, the parallel ETL for PostgreSQL
Outline
                                Introduction
                                               Main components
                                Architecture
                                               Parallel Organisation
             Configuration examples & Usage
                    Current status & TODO


Processing lines



  Parsing data is the CPU intensive part of the job. You could even
  have to guess where lines begin and end. Then you add:
      columns restrictions
      columns reordering
      user defined columns (constants)




                            Dimitri Fontaine   PgLoader, the parallel ETL for PostgreSQL
Outline
                                Introduction
                                               Main components
                                Architecture
                                               Parallel Organisation
             Configuration examples & Usage
                    Current status & TODO


Processing lines



  Parsing data is the CPU intensive part of the job. You could even
  have to guess where lines begin and end. Then you add:
      columns restrictions
      columns reordering
      user defined columns (constants)
      user defined reformating modules




                            Dimitri Fontaine   PgLoader, the parallel ETL for PostgreSQL
Outline
                                 Introduction
                                                Main components
                                 Architecture
                                                Parallel Organisation
              Configuration examples & Usage
                     Current status & TODO


COPYing to PostgreSQL



  This is how we do it:
      python cStringIO buffers




                             Dimitri Fontaine   PgLoader, the parallel ETL for PostgreSQL
Outline
                                 Introduction
                                                Main components
                                 Architecture
                                                Parallel Organisation
              Configuration examples & Usage
                     Current status & TODO


COPYing to PostgreSQL



  This is how we do it:
      python cStringIO buffers
      configurable size (copy every)




                             Dimitri Fontaine   PgLoader, the parallel ETL for PostgreSQL
Outline
                                 Introduction
                                                Main components
                                 Architecture
                                                Parallel Organisation
              Configuration examples & Usage
                     Current status & TODO


COPYing to PostgreSQL



  This is how we do it:
      python cStringIO buffers
      configurable size (copy every)
      using copy expert() when available (CVS)




                             Dimitri Fontaine   PgLoader, the parallel ETL for PostgreSQL
Outline
                                 Introduction
                                                Main components
                                 Architecture
                                                Parallel Organisation
              Configuration examples & Usage
                     Current status & TODO


COPYing to PostgreSQL



  This is how we do it:
      python cStringIO buffers
      configurable size (copy every)
      using copy expert() when available (CVS)
      dichotomic error search




                             Dimitri Fontaine   PgLoader, the parallel ETL for PostgreSQL
Outline
                                Introduction
                                               Main components
                                Architecture
                                               Parallel Organisation
             Configuration examples & Usage
                    Current status & TODO


Handling of erroneous data input



  PGLoader will continue processing your input when it contains
  erroneous data.
      reject data file




                            Dimitri Fontaine   PgLoader, the parallel ETL for PostgreSQL
Outline
                                Introduction
                                               Main components
                                Architecture
                                               Parallel Organisation
             Configuration examples & Usage
                    Current status & TODO


Handling of erroneous data input



  PGLoader will continue processing your input when it contains
  erroneous data.
      reject data file
      reject log file, containing error messages




                            Dimitri Fontaine   PgLoader, the parallel ETL for PostgreSQL
Outline
                                Introduction
                                               Main components
                                Architecture
                                               Parallel Organisation
             Configuration examples & Usage
                    Current status & TODO


Handling of erroneous data input



  PGLoader will continue processing your input when it contains
  erroneous data.
      reject data file
      reject log file, containing error messages
      errors count in summary




                            Dimitri Fontaine   PgLoader, the parallel ETL for PostgreSQL
Outline
                                   Introduction
                                                  Main components
                                   Architecture
                                                  Parallel Organisation
                Configuration examples & Usage
                       Current status & TODO


Error logging



  PGLoader will continue processing your input when it contains
  erroneous data, and will make it so that you know about the
  failures.
      log file




                               Dimitri Fontaine   PgLoader, the parallel ETL for PostgreSQL
Outline
                                   Introduction
                                                  Main components
                                   Architecture
                                                  Parallel Organisation
                Configuration examples & Usage
                       Current status & TODO


Error logging



  PGLoader will continue processing your input when it contains
  erroneous data, and will make it so that you know about the
  failures.
      log file
      console log level: client min messages




                               Dimitri Fontaine   PgLoader, the parallel ETL for PostgreSQL
Outline
                                   Introduction
                                                  Main components
                                   Architecture
                                                  Parallel Organisation
                Configuration examples & Usage
                       Current status & TODO


Error logging



  PGLoader will continue processing your input when it contains
  erroneous data, and will make it so that you know about the
  failures.
      log file
      console log level: client min messages
      logfile log level: log min messages




                               Dimitri Fontaine   PgLoader, the parallel ETL for PostgreSQL
Outline
                                Introduction
                                               Main components
                                Architecture
                                               Parallel Organisation
             Configuration examples & Usage
                    Current status & TODO


Why going parallel?




  Loading is IO bound, not CPU bound, right?




                            Dimitri Fontaine   PgLoader, the parallel ETL for PostgreSQL
Outline
                                Introduction
                                               Main components
                                Architecture
                                               Parallel Organisation
             Configuration examples & Usage
                    Current status & TODO


Why going parallel?




  Loading is IO bound, not CPU bound, right?
      for large disks array, not so much




                            Dimitri Fontaine   PgLoader, the parallel ETL for PostgreSQL
Outline
                                Introduction
                                               Main components
                                Architecture
                                               Parallel Organisation
             Configuration examples & Usage
                    Current status & TODO


Why going parallel?




  Loading is IO bound, not CPU bound, right?
      for large disks array, not so much
      with complex parsing, not so much




                            Dimitri Fontaine   PgLoader, the parallel ETL for PostgreSQL
Outline
                                Introduction
                                               Main components
                                Architecture
                                               Parallel Organisation
             Configuration examples & Usage
                    Current status & TODO


Why going parallel?




  Loading is IO bound, not CPU bound, right?
      for large disks array, not so much
      with complex parsing, not so much
      with heavy user rewritting, not so much




                            Dimitri Fontaine   PgLoader, the parallel ETL for PostgreSQL
Outline
                                Introduction
                                               Main components
                                Architecture
                                               Parallel Organisation
             Configuration examples & Usage
                    Current status & TODO


Ok... How?


     mutli-threading is easy to start with in python




  Example
      class PGLoader(threading.Thread):




                            Dimitri Fontaine   PgLoader, the parallel ETL for PostgreSQL
Outline
                                Introduction
                                               Main components
                                Architecture
                                               Parallel Organisation
             Configuration examples & Usage
                    Current status & TODO


Ok... How?


     mutli-threading is easy to start with in python
     then you add in dequeues and semaphores (critical sections)
     and signals




  Example
      class PGLoader(threading.Thread):




                            Dimitri Fontaine   PgLoader, the parallel ETL for PostgreSQL
Outline
                                Introduction
                                               Main components
                                Architecture
                                               Parallel Organisation
             Configuration examples & Usage
                    Current status & TODO


Ok... How?


     mutli-threading is easy to start with in python
     then you add in dequeues and semaphores (critical sections)
     and signals
     Giant Interpreter Lock



  Example
      class PGLoader(threading.Thread):




                            Dimitri Fontaine   PgLoader, the parallel ETL for PostgreSQL
Outline
                                Introduction
                                               Main components
                                Architecture
                                               Parallel Organisation
             Configuration examples & Usage
                    Current status & TODO


Ok... How?


     mutli-threading is easy to start with in python
     then you add in dequeues and semaphores (critical sections)
     and signals
     Giant Interpreter Lock
     fork() based reimplementation could be of interrest

  Example
      class PGLoader(threading.Thread):




                            Dimitri Fontaine   PgLoader, the parallel ETL for PostgreSQL
Outline
                                Introduction
                                               Main components
                                Architecture
                                               Parallel Organisation
             Configuration examples & Usage
                    Current status & TODO


Parallelism choices


  Has beed asked by some hackers, their use cases dictated two
  different modes.




                            Dimitri Fontaine   PgLoader, the parallel ETL for PostgreSQL
Outline
                                 Introduction
                                                Main components
                                 Architecture
                                                Parallel Organisation
              Configuration examples & Usage
                     Current status & TODO


Parallelism choices


  Has beed asked by some hackers, their use cases dictated two
  different modes.

  The idea is to have a parallel pg restore testbed, interresting with
  large input files (100GB to several TB). PGLoader’s can’t compete
  to plain COPY, due to clientserver roundtrips compared to local file
  reading, but with some more CPUs feeding the disk array, should
  show up nice improvements.




                             Dimitri Fontaine   PgLoader, the parallel ETL for PostgreSQL
Outline
                                 Introduction
                                                Main components
                                 Architecture
                                                Parallel Organisation
              Configuration examples & Usage
                     Current status & TODO


Parallelism choices


  Has beed asked by some hackers, their use cases dictated two
  different modes.

  The idea is to have a parallel pg restore testbed, interresting with
  large input files (100GB to several TB). PGLoader’s can’t compete
  to plain COPY, due to clientserver roundtrips compared to local file
  reading, but with some more CPUs feeding the disk array, should
  show up nice improvements.

  Testing and feeback more than welcome!



                             Dimitri Fontaine   PgLoader, the parallel ETL for PostgreSQL
Outline
                                 Introduction
                                                Main components
                                 Architecture
                                                Parallel Organisation
              Configuration examples & Usage
                     Current status & TODO


Round robin reader

  Parsing is all done by a single thread for all the content.




                             Dimitri Fontaine   PgLoader, the parallel ETL for PostgreSQL
Outline
                                 Introduction
                                                Main components
                                 Architecture
                                                Parallel Organisation
              Configuration examples & Usage
                     Current status & TODO


Round robin reader

  Parsing is all done by a single thread for all the content.

  N readers are started and get each a queue where to fill this round
  data, and issue COPY while main reader continue parsing.




                             Dimitri Fontaine   PgLoader, the parallel ETL for PostgreSQL
Outline
                                 Introduction
                                                Main components
                                 Architecture
                                                Parallel Organisation
              Configuration examples & Usage
                     Current status & TODO


Round robin reader

  Parsing is all done by a single thread for all the content.

  N readers are started and get each a queue where to fill this round
  data, and issue COPY while main reader continue parsing.

  Example
  [rrr]
  section_threads    = 3
  split_file_reading = False




                             Dimitri Fontaine   PgLoader, the parallel ETL for PostgreSQL
Outline
                                Introduction
                                               Main components
                                Architecture
                                               Parallel Organisation
             Configuration examples & Usage
                    Current status & TODO


Split file reader


  The file is split into N blocks and there’s as much pgloader doing
  the same job in parallel as there are blocks.




                            Dimitri Fontaine   PgLoader, the parallel ETL for PostgreSQL
Outline
                                Introduction
                                               Main components
                                Architecture
                                               Parallel Organisation
             Configuration examples & Usage
                    Current status & TODO


Split file reader


  The file is split into N blocks and there’s as much pgloader doing
  the same job in parallel as there are blocks.

  Example
  [rrr]
  section_threads    = 3
  split_file_reading = True




                            Dimitri Fontaine   PgLoader, the parallel ETL for PostgreSQL
Outline
                                Introduction
                                Architecture
             Configuration examples & Usage
                    Current status & TODO


Examples




  PGLoader distribution comes with diverse examples, don’t forget
  to see about them.




                            Dimitri Fontaine   PgLoader, the parallel ETL for PostgreSQL
Outline
                                Introduction
                                Architecture
             Configuration examples & Usage
                    Current status & TODO


simple

  That simple:




                            Dimitri Fontaine   PgLoader, the parallel ETL for PostgreSQL
Outline
                                Introduction
                                Architecture
             Configuration examples & Usage
                    Current status & TODO


simple

  That simple:

  Example
  [simple]
  table            =   simple
  filename         =   simple/simple.data
  format           =   text
  datestyle        =   dmy
  field_sep        =   |
  trailing_sep     =   True
  columns          =   *


                            Dimitri Fontaine   PgLoader, the parallel ETL for PostgreSQL
Outline
                                Introduction
                                Architecture
             Configuration examples & Usage
                    Current status & TODO


User defined columns

  Constant columns added at parsing time.




                            Dimitri Fontaine   PgLoader, the parallel ETL for PostgreSQL
Outline
                                Introduction
                                Architecture
             Configuration examples & Usage
                    Current status & TODO


User defined columns

  Constant columns added at parsing time.

  Use case: adding an origin server id field depending on the file to
  get loaded, for data aggregation.




                            Dimitri Fontaine   PgLoader, the parallel ETL for PostgreSQL
Outline
                                Introduction
                                Architecture
             Configuration examples & Usage
                    Current status & TODO


User defined columns

  Constant columns added at parsing time.

  Use case: adding an origin server id field depending on the file to
  get loaded, for data aggregation.

  Example
     [server_A]
     file                    =   imports/A.csv
     columns                 =   b:2, d:1, x:3, y:4
     udc_c                   =   A
     copy_columns            =   b, c, d


                            Dimitri Fontaine   PgLoader, the parallel ETL for PostgreSQL
Outline
                                Introduction
                                Architecture
             Configuration examples & Usage
                    Current status & TODO


User defined Reformating modules
  The basic idea is to avoid any pre-processing done with another
  tool (sed, awk, you name it).




                            Dimitri Fontaine   PgLoader, the parallel ETL for PostgreSQL
Outline
                                Introduction
                                Architecture
             Configuration examples & Usage
                    Current status & TODO


User defined Reformating modules
  The basic idea is to avoid any pre-processing done with another
  tool (sed, awk, you name it).
   file has ’12131415’
  Example
  [fixed]
  table                  =    fixed
  format                 =    fixed
  filename               =    fixed/fixed.data
  columns                =    *
  fixed_specs            =    a:0:10, b:10:8, c:18:8, d:26:17
  reformat               =    c:pgtime:time



                             Dimitri Fontaine   PgLoader, the parallel ETL for PostgreSQL
Outline
                                Introduction
                                Architecture
             Configuration examples & Usage
                    Current status & TODO


User defined Reformating modules
  The basic idea is to avoid any pre-processing done with another
  tool (sed, awk, you name it).
   file has ’12131415’ we want ’12:13:14.15’
  Example
  def time(reject, input):
      quot;quot;quot; Reformat str as a PostgreSQL time quot;quot;quot;
      if len(input) != 8:
          reject.log(mesg, input)

      hour       = input[0:2]
      ...
      return ’%s:%s:%s.%s’ % (hour, min, secs, cents)


                            Dimitri Fontaine   PgLoader, the parallel ETL for PostgreSQL
Outline
                              Introduction
                              Architecture
           Configuration examples & Usage
                  Current status & TODO


The fine manual says it all



  At http://pgloader.projects.postgresql.org/ or man
  pgloader
  Example
  > pgloader --help
  > pgloader --version
  > pgloader -DTsc pgloader.conf




                          Dimitri Fontaine   PgLoader, the parallel ETL for PostgreSQL
Outline
                             Introduction
                             Architecture
          Configuration examples & Usage
                 Current status & TODO


TODO



 http://pgloader.projects.postgresql.org/dev/TODO.html




                         Dimitri Fontaine   PgLoader, the parallel ETL for PostgreSQL
Outline
                              Introduction
                              Architecture
           Configuration examples & Usage
                  Current status & TODO


TODO



 http://pgloader.projects.postgresql.org/dev/TODO.html
    Constraint Exclusion support




                          Dimitri Fontaine   PgLoader, the parallel ETL for PostgreSQL
Outline
                              Introduction
                              Architecture
           Configuration examples & Usage
                  Current status & TODO


TODO



 http://pgloader.projects.postgresql.org/dev/TODO.html
    Constraint Exclusion support
    Reject Behaviour




                          Dimitri Fontaine   PgLoader, the parallel ETL for PostgreSQL
Outline
                              Introduction
                              Architecture
           Configuration examples & Usage
                  Current status & TODO


TODO



 http://pgloader.projects.postgresql.org/dev/TODO.html
    Constraint Exclusion support
    Reject Behaviour
    XML support with user defined XSLT StyleSheet




                          Dimitri Fontaine   PgLoader, the parallel ETL for PostgreSQL
Outline
                               Introduction
                               Architecture
            Configuration examples & Usage
                   Current status & TODO


TODO



 http://pgloader.projects.postgresql.org/dev/TODO.html
    Constraint Exclusion support
    Reject Behaviour
    XML support with user defined XSLT StyleSheet
    Facilities




                           Dimitri Fontaine   PgLoader, the parallel ETL for PostgreSQL
Outline
                                Introduction
                                Architecture
             Configuration examples & Usage
                    Current status & TODO


TODO



 http://pgloader.projects.postgresql.org/dev/TODO.html
     Constraint Exclusion support
     Reject Behaviour
     XML support with user defined XSLT StyleSheet
     Facilities
 Don’t be shy and just ask for new features!




                            Dimitri Fontaine   PgLoader, the parallel ETL for PostgreSQL
Outline
                                Introduction
                                Architecture
             Configuration examples & Usage
                    Current status & TODO


Resources and Users



  pgfoundry, 1 developper, some users, no mailing list yet (no one
  asking for one), some mails sometime, seldom bug reports (fixed)




                            Dimitri Fontaine   PgLoader, the parallel ETL for PostgreSQL
Outline
                                Introduction
                                Architecture
             Configuration examples & Usage
                    Current status & TODO


Resources and Users



  pgfoundry, 1 developper, some users, no mailing list yet (no one
  asking for one), some mails sometime, seldom bug reports (fixed)

  Support ongoing at #postgresql and #postgresqlfr




                            Dimitri Fontaine   PgLoader, the parallel ETL for PostgreSQL
Outline
                                Introduction
                                Architecture
             Configuration examples & Usage
                    Current status & TODO


Resources and Users



  pgfoundry, 1 developper, some users, no mailing list yet (no one
  asking for one), some mails sometime, seldom bug reports (fixed)

  Support ongoing at #postgresql and #postgresqlfr

  packages for debian, FreeBSD, OpenBSD, CentOS, RHEL and
  Fedora.




                            Dimitri Fontaine   PgLoader, the parallel ETL for PostgreSQL

Weitere ähnliche Inhalte

Was ist angesagt?

Graal and Truffle: Modularity and Separation of Concerns as Cornerstones for ...
Graal and Truffle: Modularity and Separation of Concerns as Cornerstones for ...Graal and Truffle: Modularity and Separation of Concerns as Cornerstones for ...
Graal and Truffle: Modularity and Separation of Concerns as Cornerstones for ...
Thomas Wuerthinger
 
Understanding Android Benchmarks
Understanding Android BenchmarksUnderstanding Android Benchmarks
Understanding Android Benchmarks
Koan-Sin Tan
 
UFT- New features and comparison with QTP
UFT- New features and comparison with QTPUFT- New features and comparison with QTP
UFT- New features and comparison with QTP
Rita Singh
 

Was ist angesagt? (9)

Graal and Truffle: Modularity and Separation of Concerns as Cornerstones for ...
Graal and Truffle: Modularity and Separation of Concerns as Cornerstones for ...Graal and Truffle: Modularity and Separation of Concerns as Cornerstones for ...
Graal and Truffle: Modularity and Separation of Concerns as Cornerstones for ...
 
Micro-Benchmarking Considered Harmful
Micro-Benchmarking Considered HarmfulMicro-Benchmarking Considered Harmful
Micro-Benchmarking Considered Harmful
 
Avro - More Than Just a Serialization Framework - CHUG - 20120416
Avro - More Than Just a Serialization Framework - CHUG - 20120416Avro - More Than Just a Serialization Framework - CHUG - 20120416
Avro - More Than Just a Serialization Framework - CHUG - 20120416
 
Tech_Implementation of Complex ITIM Workflows
Tech_Implementation of Complex ITIM WorkflowsTech_Implementation of Complex ITIM Workflows
Tech_Implementation of Complex ITIM Workflows
 
Graal and Truffle: One VM to Rule Them All
Graal and Truffle: One VM to Rule Them AllGraal and Truffle: One VM to Rule Them All
Graal and Truffle: One VM to Rule Them All
 
Understanding Android Benchmarks
Understanding Android BenchmarksUnderstanding Android Benchmarks
Understanding Android Benchmarks
 
Java byte code & virtual machine
Java byte code & virtual machineJava byte code & virtual machine
Java byte code & virtual machine
 
2CPP02 - C++ Primer
2CPP02 - C++ Primer2CPP02 - C++ Primer
2CPP02 - C++ Primer
 
UFT- New features and comparison with QTP
UFT- New features and comparison with QTPUFT- New features and comparison with QTP
UFT- New features and comparison with QTP
 

Andere mochten auch (7)

International Trends in Mobile Law
International Trends in Mobile LawInternational Trends in Mobile Law
International Trends in Mobile Law
 
AbundantlifeU/Merge Handout
AbundantlifeU/Merge HandoutAbundantlifeU/Merge Handout
AbundantlifeU/Merge Handout
 
A Day In The Life Of A Cyber Syndicate
A Day In The Life Of A Cyber SyndicateA Day In The Life Of A Cyber Syndicate
A Day In The Life Of A Cyber Syndicate
 
Presentation1
Presentation1Presentation1
Presentation1
 
Carteles
CartelesCarteles
Carteles
 
Cyber Crime in Government
Cyber Crime in GovernmentCyber Crime in Government
Cyber Crime in Government
 
Cyber Crime 101: The Impact of Cyber Crime on Higher Education in South Africa
Cyber Crime 101:  The Impact of Cyber Crime on Higher Education in South AfricaCyber Crime 101:  The Impact of Cyber Crime on Higher Education in South Africa
Cyber Crime 101: The Impact of Cyber Crime on Higher Education in South Africa
 

Ähnlich wie Pgloader

Small Overview of Skype Database Tools
Small Overview of Skype Database ToolsSmall Overview of Skype Database Tools
Small Overview of Skype Database Tools
elliando dias
 
Mastering PostgreSQL Administration
Mastering PostgreSQL AdministrationMastering PostgreSQL Administration
Mastering PostgreSQL Administration
Command Prompt., Inc
 
Database Tools by Skype
Database Tools by SkypeDatabase Tools by Skype
Database Tools by Skype
elliando dias
 

Ähnlich wie Pgloader (20)

Spring Batch Introduction
Spring Batch IntroductionSpring Batch Introduction
Spring Batch Introduction
 
Porting Oracle Applications to PostgreSQL
Porting Oracle Applications to PostgreSQLPorting Oracle Applications to PostgreSQL
Porting Oracle Applications to PostgreSQL
 
A Tour of PostgREST
A Tour of PostgRESTA Tour of PostgREST
A Tour of PostgREST
 
Small Overview of Skype Database Tools
Small Overview of Skype Database ToolsSmall Overview of Skype Database Tools
Small Overview of Skype Database Tools
 
Jack Gudenkauf sparkug_20151207_7
Jack Gudenkauf sparkug_20151207_7Jack Gudenkauf sparkug_20151207_7
Jack Gudenkauf sparkug_20151207_7
 
A noETL Parallel Streaming Transformation Loader using Spark, Kafka­ & Ver­tica
A noETL Parallel Streaming Transformation Loader using Spark, Kafka­ & Ver­ticaA noETL Parallel Streaming Transformation Loader using Spark, Kafka­ & Ver­tica
A noETL Parallel Streaming Transformation Loader using Spark, Kafka­ & Ver­tica
 
Intro to Talend Open Studio for Data Integration
Intro to Talend Open Studio for Data IntegrationIntro to Talend Open Studio for Data Integration
Intro to Talend Open Studio for Data Integration
 
Context-aware Fast Food Recommendation with Ray on Apache Spark at Burger King
Context-aware Fast Food Recommendation with Ray on Apache Spark at Burger KingContext-aware Fast Food Recommendation with Ray on Apache Spark at Burger King
Context-aware Fast Food Recommendation with Ray on Apache Spark at Burger King
 
Presto
PrestoPresto
Presto
 
GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...
GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...
GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...
 
Challenges of angular in production (Tasos Bekos) - GreeceJS #17
Challenges of angular in production (Tasos Bekos) - GreeceJS #17Challenges of angular in production (Tasos Bekos) - GreeceJS #17
Challenges of angular in production (Tasos Bekos) - GreeceJS #17
 
Prestogres internals
Prestogres internalsPrestogres internals
Prestogres internals
 
PostgreSQL As Data Integration Tool
PostgreSQL As Data Integration ToolPostgreSQL As Data Integration Tool
PostgreSQL As Data Integration Tool
 
PGDay.Amsterdam 2018 - Stefanie Stoelting - PostgreSQL As Data Integration Tool
PGDay.Amsterdam 2018 - Stefanie Stoelting - PostgreSQL As Data Integration ToolPGDay.Amsterdam 2018 - Stefanie Stoelting - PostgreSQL As Data Integration Tool
PGDay.Amsterdam 2018 - Stefanie Stoelting - PostgreSQL As Data Integration Tool
 
MTaulty_DevWeek_Parallel
MTaulty_DevWeek_ParallelMTaulty_DevWeek_Parallel
MTaulty_DevWeek_Parallel
 
Mastering PostgreSQL Administration
Mastering PostgreSQL AdministrationMastering PostgreSQL Administration
Mastering PostgreSQL Administration
 
Mastering PostgreSQL Administration
Mastering PostgreSQL AdministrationMastering PostgreSQL Administration
Mastering PostgreSQL Administration
 
groonga with PostgreSQL
groonga with PostgreSQLgroonga with PostgreSQL
groonga with PostgreSQL
 
Database Tools by Skype
Database Tools by SkypeDatabase Tools by Skype
Database Tools by Skype
 
Building data pipelines
Building data pipelinesBuilding data pipelines
Building data pipelines
 

Pgloader

  • 1. Outline Introduction Architecture Configuration examples & Usage Current status & TODO PgLoader, the parallel ETL for PostgreSQL Dimitri Fontaine October 17, 2008 Dimitri Fontaine PgLoader, the parallel ETL for PostgreSQL
  • 2. Outline Introduction Architecture Configuration examples & Usage Current status & TODO Table of contents 1 Introduction pgloader, the what? 2 Architecture Main components Parallel Organisation 3 Configuration examples & Usage 4 Current status & TODO Dimitri Fontaine PgLoader, the parallel ETL for PostgreSQL
  • 3. Outline Introduction Architecture pgloader, the what? Configuration examples & Usage Current status & TODO ETL Definition An ETL process data to load into the database from a flat file. 1 Extract 2 Transform 3 Load Dimitri Fontaine PgLoader, the parallel ETL for PostgreSQL
  • 4. Outline Introduction Architecture pgloader, the what? Configuration examples & Usage Current status & TODO pgloader’s features PGLoader will: Load CSV data Dimitri Fontaine PgLoader, the parallel ETL for PostgreSQL
  • 5. Outline Introduction Architecture pgloader, the what? Configuration examples & Usage Current status & TODO pgloader’s features PGLoader will: Load CSV data Load pretend-to-be CSV data Dimitri Fontaine PgLoader, the parallel ETL for PostgreSQL
  • 6. Outline Introduction Architecture pgloader, the what? Configuration examples & Usage Current status & TODO pgloader’s features PGLoader will: Load CSV data Load pretend-to-be CSV data Continue loading when confronted to errors Dimitri Fontaine PgLoader, the parallel ETL for PostgreSQL
  • 7. Outline Introduction Architecture pgloader, the what? Configuration examples & Usage Current status & TODO pgloader’s features PGLoader will: Load CSV data Load pretend-to-be CSV data Continue loading when confronted to errors Apply user define transformation to data, on the fly Dimitri Fontaine PgLoader, the parallel ETL for PostgreSQL
  • 8. Outline Introduction Architecture pgloader, the what? Configuration examples & Usage Current status & TODO pgloader’s features PGLoader will: Load CSV data Load pretend-to-be CSV data Continue loading when confronted to errors Apply user define transformation to data, on the fly Optionaly have all your cores participate into processing Dimitri Fontaine PgLoader, the parallel ETL for PostgreSQL
  • 9. Outline Introduction Main components Architecture Parallel Organisation Configuration examples & Usage Current status & TODO Configuration We first parse the configuration, with templating system Example [simple] use_template = simple_tmpl table = simple filename = simple/simple.data columns = a:1, b:3, c:2 Dimitri Fontaine PgLoader, the parallel ETL for PostgreSQL
  • 10. Outline Introduction Main components Architecture Parallel Organisation Configuration examples & Usage Current status & TODO Loading: file reading PGLoader supports many input formats, even if they all look like CSV, the rough time is parsing data: Read files one line at a time Dimitri Fontaine PgLoader, the parallel ETL for PostgreSQL
  • 11. Outline Introduction Main components Architecture Parallel Organisation Configuration examples & Usage Current status & TODO Loading: file reading PGLoader supports many input formats, even if they all look like CSV, the rough time is parsing data: Read files one line at a time Parse physical lines into logical lines Dimitri Fontaine PgLoader, the parallel ETL for PostgreSQL
  • 12. Outline Introduction Main components Architecture Parallel Organisation Configuration examples & Usage Current status & TODO Loading: file reading PGLoader supports many input formats, even if they all look like CSV, the rough time is parsing data: Read files one line at a time Parse physical lines into logical lines Supports several readers csvreader Dimitri Fontaine PgLoader, the parallel ETL for PostgreSQL
  • 13. Outline Introduction Main components Architecture Parallel Organisation Configuration examples & Usage Current status & TODO Loading: file reading PGLoader supports many input formats, even if they all look like CSV, the rough time is parsing data: Read files one line at a time Parse physical lines into logical lines Supports several readers textreader csvreader Dimitri Fontaine PgLoader, the parallel ETL for PostgreSQL
  • 14. Outline Introduction Main components Architecture Parallel Organisation Configuration examples & Usage Current status & TODO Loading: file reading PGLoader supports many input formats, even if they all look like CSV, the rough time is parsing data: Read files one line at a time Parse physical lines into logical lines Supports several readers textreader csvreader fixedreader Dimitri Fontaine PgLoader, the parallel ETL for PostgreSQL
  • 15. Outline Introduction Main components Architecture Parallel Organisation Configuration examples & Usage Current status & TODO Processing lines Parsing data is the CPU intensive part of the job. You could even have to guess where lines begin and end. Dimitri Fontaine PgLoader, the parallel ETL for PostgreSQL
  • 16. Outline Introduction Main components Architecture Parallel Organisation Configuration examples & Usage Current status & TODO Processing lines Parsing data is the CPU intensive part of the job. You could even have to guess where lines begin and end. Then you add: columns restrictions Dimitri Fontaine PgLoader, the parallel ETL for PostgreSQL
  • 17. Outline Introduction Main components Architecture Parallel Organisation Configuration examples & Usage Current status & TODO Processing lines Parsing data is the CPU intensive part of the job. You could even have to guess where lines begin and end. Then you add: columns restrictions columns reordering Dimitri Fontaine PgLoader, the parallel ETL for PostgreSQL
  • 18. Outline Introduction Main components Architecture Parallel Organisation Configuration examples & Usage Current status & TODO Processing lines Parsing data is the CPU intensive part of the job. You could even have to guess where lines begin and end. Then you add: columns restrictions columns reordering user defined columns (constants) Dimitri Fontaine PgLoader, the parallel ETL for PostgreSQL
  • 19. Outline Introduction Main components Architecture Parallel Organisation Configuration examples & Usage Current status & TODO Processing lines Parsing data is the CPU intensive part of the job. You could even have to guess where lines begin and end. Then you add: columns restrictions columns reordering user defined columns (constants) user defined reformating modules Dimitri Fontaine PgLoader, the parallel ETL for PostgreSQL
  • 20. Outline Introduction Main components Architecture Parallel Organisation Configuration examples & Usage Current status & TODO COPYing to PostgreSQL This is how we do it: python cStringIO buffers Dimitri Fontaine PgLoader, the parallel ETL for PostgreSQL
  • 21. Outline Introduction Main components Architecture Parallel Organisation Configuration examples & Usage Current status & TODO COPYing to PostgreSQL This is how we do it: python cStringIO buffers configurable size (copy every) Dimitri Fontaine PgLoader, the parallel ETL for PostgreSQL
  • 22. Outline Introduction Main components Architecture Parallel Organisation Configuration examples & Usage Current status & TODO COPYing to PostgreSQL This is how we do it: python cStringIO buffers configurable size (copy every) using copy expert() when available (CVS) Dimitri Fontaine PgLoader, the parallel ETL for PostgreSQL
  • 23. Outline Introduction Main components Architecture Parallel Organisation Configuration examples & Usage Current status & TODO COPYing to PostgreSQL This is how we do it: python cStringIO buffers configurable size (copy every) using copy expert() when available (CVS) dichotomic error search Dimitri Fontaine PgLoader, the parallel ETL for PostgreSQL
  • 24. Outline Introduction Main components Architecture Parallel Organisation Configuration examples & Usage Current status & TODO Handling of erroneous data input PGLoader will continue processing your input when it contains erroneous data. reject data file Dimitri Fontaine PgLoader, the parallel ETL for PostgreSQL
  • 25. Outline Introduction Main components Architecture Parallel Organisation Configuration examples & Usage Current status & TODO Handling of erroneous data input PGLoader will continue processing your input when it contains erroneous data. reject data file reject log file, containing error messages Dimitri Fontaine PgLoader, the parallel ETL for PostgreSQL
  • 26. Outline Introduction Main components Architecture Parallel Organisation Configuration examples & Usage Current status & TODO Handling of erroneous data input PGLoader will continue processing your input when it contains erroneous data. reject data file reject log file, containing error messages errors count in summary Dimitri Fontaine PgLoader, the parallel ETL for PostgreSQL
  • 27. Outline Introduction Main components Architecture Parallel Organisation Configuration examples & Usage Current status & TODO Error logging PGLoader will continue processing your input when it contains erroneous data, and will make it so that you know about the failures. log file Dimitri Fontaine PgLoader, the parallel ETL for PostgreSQL
  • 28. Outline Introduction Main components Architecture Parallel Organisation Configuration examples & Usage Current status & TODO Error logging PGLoader will continue processing your input when it contains erroneous data, and will make it so that you know about the failures. log file console log level: client min messages Dimitri Fontaine PgLoader, the parallel ETL for PostgreSQL
  • 29. Outline Introduction Main components Architecture Parallel Organisation Configuration examples & Usage Current status & TODO Error logging PGLoader will continue processing your input when it contains erroneous data, and will make it so that you know about the failures. log file console log level: client min messages logfile log level: log min messages Dimitri Fontaine PgLoader, the parallel ETL for PostgreSQL
  • 30. Outline Introduction Main components Architecture Parallel Organisation Configuration examples & Usage Current status & TODO Why going parallel? Loading is IO bound, not CPU bound, right? Dimitri Fontaine PgLoader, the parallel ETL for PostgreSQL
  • 31. Outline Introduction Main components Architecture Parallel Organisation Configuration examples & Usage Current status & TODO Why going parallel? Loading is IO bound, not CPU bound, right? for large disks array, not so much Dimitri Fontaine PgLoader, the parallel ETL for PostgreSQL
  • 32. Outline Introduction Main components Architecture Parallel Organisation Configuration examples & Usage Current status & TODO Why going parallel? Loading is IO bound, not CPU bound, right? for large disks array, not so much with complex parsing, not so much Dimitri Fontaine PgLoader, the parallel ETL for PostgreSQL
  • 33. Outline Introduction Main components Architecture Parallel Organisation Configuration examples & Usage Current status & TODO Why going parallel? Loading is IO bound, not CPU bound, right? for large disks array, not so much with complex parsing, not so much with heavy user rewritting, not so much Dimitri Fontaine PgLoader, the parallel ETL for PostgreSQL
  • 34. Outline Introduction Main components Architecture Parallel Organisation Configuration examples & Usage Current status & TODO Ok... How? mutli-threading is easy to start with in python Example class PGLoader(threading.Thread): Dimitri Fontaine PgLoader, the parallel ETL for PostgreSQL
  • 35. Outline Introduction Main components Architecture Parallel Organisation Configuration examples & Usage Current status & TODO Ok... How? mutli-threading is easy to start with in python then you add in dequeues and semaphores (critical sections) and signals Example class PGLoader(threading.Thread): Dimitri Fontaine PgLoader, the parallel ETL for PostgreSQL
  • 36. Outline Introduction Main components Architecture Parallel Organisation Configuration examples & Usage Current status & TODO Ok... How? mutli-threading is easy to start with in python then you add in dequeues and semaphores (critical sections) and signals Giant Interpreter Lock Example class PGLoader(threading.Thread): Dimitri Fontaine PgLoader, the parallel ETL for PostgreSQL
  • 37. Outline Introduction Main components Architecture Parallel Organisation Configuration examples & Usage Current status & TODO Ok... How? mutli-threading is easy to start with in python then you add in dequeues and semaphores (critical sections) and signals Giant Interpreter Lock fork() based reimplementation could be of interrest Example class PGLoader(threading.Thread): Dimitri Fontaine PgLoader, the parallel ETL for PostgreSQL
  • 38. Outline Introduction Main components Architecture Parallel Organisation Configuration examples & Usage Current status & TODO Parallelism choices Has beed asked by some hackers, their use cases dictated two different modes. Dimitri Fontaine PgLoader, the parallel ETL for PostgreSQL
  • 39. Outline Introduction Main components Architecture Parallel Organisation Configuration examples & Usage Current status & TODO Parallelism choices Has beed asked by some hackers, their use cases dictated two different modes. The idea is to have a parallel pg restore testbed, interresting with large input files (100GB to several TB). PGLoader’s can’t compete to plain COPY, due to clientserver roundtrips compared to local file reading, but with some more CPUs feeding the disk array, should show up nice improvements. Dimitri Fontaine PgLoader, the parallel ETL for PostgreSQL
  • 40. Outline Introduction Main components Architecture Parallel Organisation Configuration examples & Usage Current status & TODO Parallelism choices Has beed asked by some hackers, their use cases dictated two different modes. The idea is to have a parallel pg restore testbed, interresting with large input files (100GB to several TB). PGLoader’s can’t compete to plain COPY, due to clientserver roundtrips compared to local file reading, but with some more CPUs feeding the disk array, should show up nice improvements. Testing and feeback more than welcome! Dimitri Fontaine PgLoader, the parallel ETL for PostgreSQL
  • 41. Outline Introduction Main components Architecture Parallel Organisation Configuration examples & Usage Current status & TODO Round robin reader Parsing is all done by a single thread for all the content. Dimitri Fontaine PgLoader, the parallel ETL for PostgreSQL
  • 42. Outline Introduction Main components Architecture Parallel Organisation Configuration examples & Usage Current status & TODO Round robin reader Parsing is all done by a single thread for all the content. N readers are started and get each a queue where to fill this round data, and issue COPY while main reader continue parsing. Dimitri Fontaine PgLoader, the parallel ETL for PostgreSQL
  • 43. Outline Introduction Main components Architecture Parallel Organisation Configuration examples & Usage Current status & TODO Round robin reader Parsing is all done by a single thread for all the content. N readers are started and get each a queue where to fill this round data, and issue COPY while main reader continue parsing. Example [rrr] section_threads = 3 split_file_reading = False Dimitri Fontaine PgLoader, the parallel ETL for PostgreSQL
  • 44. Outline Introduction Main components Architecture Parallel Organisation Configuration examples & Usage Current status & TODO Split file reader The file is split into N blocks and there’s as much pgloader doing the same job in parallel as there are blocks. Dimitri Fontaine PgLoader, the parallel ETL for PostgreSQL
  • 45. Outline Introduction Main components Architecture Parallel Organisation Configuration examples & Usage Current status & TODO Split file reader The file is split into N blocks and there’s as much pgloader doing the same job in parallel as there are blocks. Example [rrr] section_threads = 3 split_file_reading = True Dimitri Fontaine PgLoader, the parallel ETL for PostgreSQL
  • 46. Outline Introduction Architecture Configuration examples & Usage Current status & TODO Examples PGLoader distribution comes with diverse examples, don’t forget to see about them. Dimitri Fontaine PgLoader, the parallel ETL for PostgreSQL
  • 47. Outline Introduction Architecture Configuration examples & Usage Current status & TODO simple That simple: Dimitri Fontaine PgLoader, the parallel ETL for PostgreSQL
  • 48. Outline Introduction Architecture Configuration examples & Usage Current status & TODO simple That simple: Example [simple] table = simple filename = simple/simple.data format = text datestyle = dmy field_sep = | trailing_sep = True columns = * Dimitri Fontaine PgLoader, the parallel ETL for PostgreSQL
  • 49. Outline Introduction Architecture Configuration examples & Usage Current status & TODO User defined columns Constant columns added at parsing time. Dimitri Fontaine PgLoader, the parallel ETL for PostgreSQL
  • 50. Outline Introduction Architecture Configuration examples & Usage Current status & TODO User defined columns Constant columns added at parsing time. Use case: adding an origin server id field depending on the file to get loaded, for data aggregation. Dimitri Fontaine PgLoader, the parallel ETL for PostgreSQL
  • 51. Outline Introduction Architecture Configuration examples & Usage Current status & TODO User defined columns Constant columns added at parsing time. Use case: adding an origin server id field depending on the file to get loaded, for data aggregation. Example [server_A] file = imports/A.csv columns = b:2, d:1, x:3, y:4 udc_c = A copy_columns = b, c, d Dimitri Fontaine PgLoader, the parallel ETL for PostgreSQL
  • 52. Outline Introduction Architecture Configuration examples & Usage Current status & TODO User defined Reformating modules The basic idea is to avoid any pre-processing done with another tool (sed, awk, you name it). Dimitri Fontaine PgLoader, the parallel ETL for PostgreSQL
  • 53. Outline Introduction Architecture Configuration examples & Usage Current status & TODO User defined Reformating modules The basic idea is to avoid any pre-processing done with another tool (sed, awk, you name it). file has ’12131415’ Example [fixed] table = fixed format = fixed filename = fixed/fixed.data columns = * fixed_specs = a:0:10, b:10:8, c:18:8, d:26:17 reformat = c:pgtime:time Dimitri Fontaine PgLoader, the parallel ETL for PostgreSQL
  • 54. Outline Introduction Architecture Configuration examples & Usage Current status & TODO User defined Reformating modules The basic idea is to avoid any pre-processing done with another tool (sed, awk, you name it). file has ’12131415’ we want ’12:13:14.15’ Example def time(reject, input): quot;quot;quot; Reformat str as a PostgreSQL time quot;quot;quot; if len(input) != 8: reject.log(mesg, input) hour = input[0:2] ... return ’%s:%s:%s.%s’ % (hour, min, secs, cents) Dimitri Fontaine PgLoader, the parallel ETL for PostgreSQL
  • 55. Outline Introduction Architecture Configuration examples & Usage Current status & TODO The fine manual says it all At http://pgloader.projects.postgresql.org/ or man pgloader Example > pgloader --help > pgloader --version > pgloader -DTsc pgloader.conf Dimitri Fontaine PgLoader, the parallel ETL for PostgreSQL
  • 56. Outline Introduction Architecture Configuration examples & Usage Current status & TODO TODO http://pgloader.projects.postgresql.org/dev/TODO.html Dimitri Fontaine PgLoader, the parallel ETL for PostgreSQL
  • 57. Outline Introduction Architecture Configuration examples & Usage Current status & TODO TODO http://pgloader.projects.postgresql.org/dev/TODO.html Constraint Exclusion support Dimitri Fontaine PgLoader, the parallel ETL for PostgreSQL
  • 58. Outline Introduction Architecture Configuration examples & Usage Current status & TODO TODO http://pgloader.projects.postgresql.org/dev/TODO.html Constraint Exclusion support Reject Behaviour Dimitri Fontaine PgLoader, the parallel ETL for PostgreSQL
  • 59. Outline Introduction Architecture Configuration examples & Usage Current status & TODO TODO http://pgloader.projects.postgresql.org/dev/TODO.html Constraint Exclusion support Reject Behaviour XML support with user defined XSLT StyleSheet Dimitri Fontaine PgLoader, the parallel ETL for PostgreSQL
  • 60. Outline Introduction Architecture Configuration examples & Usage Current status & TODO TODO http://pgloader.projects.postgresql.org/dev/TODO.html Constraint Exclusion support Reject Behaviour XML support with user defined XSLT StyleSheet Facilities Dimitri Fontaine PgLoader, the parallel ETL for PostgreSQL
  • 61. Outline Introduction Architecture Configuration examples & Usage Current status & TODO TODO http://pgloader.projects.postgresql.org/dev/TODO.html Constraint Exclusion support Reject Behaviour XML support with user defined XSLT StyleSheet Facilities Don’t be shy and just ask for new features! Dimitri Fontaine PgLoader, the parallel ETL for PostgreSQL
  • 62. Outline Introduction Architecture Configuration examples & Usage Current status & TODO Resources and Users pgfoundry, 1 developper, some users, no mailing list yet (no one asking for one), some mails sometime, seldom bug reports (fixed) Dimitri Fontaine PgLoader, the parallel ETL for PostgreSQL
  • 63. Outline Introduction Architecture Configuration examples & Usage Current status & TODO Resources and Users pgfoundry, 1 developper, some users, no mailing list yet (no one asking for one), some mails sometime, seldom bug reports (fixed) Support ongoing at #postgresql and #postgresqlfr Dimitri Fontaine PgLoader, the parallel ETL for PostgreSQL
  • 64. Outline Introduction Architecture Configuration examples & Usage Current status & TODO Resources and Users pgfoundry, 1 developper, some users, no mailing list yet (no one asking for one), some mails sometime, seldom bug reports (fixed) Support ongoing at #postgresql and #postgresqlfr packages for debian, FreeBSD, OpenBSD, CentOS, RHEL and Fedora. Dimitri Fontaine PgLoader, the parallel ETL for PostgreSQL