SlideShare ist ein Scribd-Unternehmen logo
1 von 35
Leveraging Hadoop for Legacy Systems

   Mathias Herberts - @herberts
Crédit Mutuel Arkéa key Facts & Figures (as of 2011-06-30)
A Regional Bank with a National Network
Why Hadoop?
Why Hadoop?
▪ Ever increasing volume of data

▪ Very regulated sector (Basel II/III, Solvency II)

    ▪ Need to produce compliance reports

▪ Competitive sector

    ▪ Need to create value, data identified as a great source of it

▪ Keep costs under control
▪ Fond of Open Source
▪ Engineers like big challenges
What Challenge?
Storing Data
Types of logical storage


      Virtual Storage Access Method
      Record-oriented (fixed or variable length) indexed datasets




      Physically Sequential
      Record-oriented (fixed or variable length) datasets, not indexed
      Can exist on different types of media




      IBM DB2 Relational Model Database Server
Types of binary records stored

COBOL Records (conform to a COPYBOOK)




DB2 'UNLOAD' Records (conform to a DDL statement)
Types of data stored in HDFS


      {Tab, Comma, ...} Separated Values
      One line records of multiple columns




      Text
      Line-oriented (eg logs)




      Hadoop SequenceFiles
      Block compressed
       ▪ Mostly BytesWritable key/value
        ▪ COBOL records
        ▪ DB2 unloaded records
        ▪ Serialized Thrift structures
       ▪ Use of DefaultCodec (pure Java)
Moving Data
Standard data transfer process




  ▪ On the fly charset conversion
  ▪ Loss of notion of records
Hadoop data transfer process




  ▪ On the fly compression
  ▪ Keep original charset
  ▪ Preserved notion of records
Staging Server



▪ Gateway In & Out of an HDFS Cell
▪ Reads/Writes to /hdfs/staging/{in,out}/... (runs as hdfs)
▪ HTTP Based (POST/GET)

▪ Upload to http://hadoop-staging/put[/hdfs/staging/in/...]
   Stores directly in HDFS, no intermediary storage
   Multiple files support
   Random target directory created if none specified
   Parameters user, group, perm, suffix
   curl -F "file=@local;filename=remote" http://../put?user=foo&group=bar&perm=644&suffix=.test


▪ Download from http://hadoop-staging/get/hdfs/staging/out/...
   Ability to unpack SequenceFile records (unpack={base64,hex}) as key:value lines
fileutil



▪   Swiss Army Knife for SequenceFiles, HDFS Staging Server, ZooKeeper
▪   Written in Java, single jar
▪   Works in all our environments (z/OS, Unix, Windows, ...)
▪   Can be ran using TWS/OPC on z/OS (via a JCL), $Universe on Unix, cron ...
▪   Multiple commands
      sfstage            Convert a z/OS dataset to a SF and push it to the staging server
      {stream,file}stage Push a stream or files to the staging server
      filesfstage        Convert a file to a SF (one record per block) and stage it
      sfpack             Pack key:value lines (cf unpack) in a SequenceFile
      sfarchive          Create a SequenceFile, one record per input file
      zk{ls,lsr,cat,stat}Read data from ZooKeeper
      get                Retrieve data via URI
      ...
Accessing Data
Data Organization


▪ Use of a directory structure that mimics the datasets names

      PR0172.PVS00.F7209588

      Environment / Silo / Application

      /hdfs/data/si/PR/01/72/PR0172.PVS00.F7209588.SUFFIX

▪   Group ACLs at the Environment/Silo/Application levels
▪   Suffix is mainly used to add .YYYYMM to Generation Data Groups
▪   Suffix added by the staging server
▪   DB2 Table unloads follow similar rules

      P11DBA.T90XXXX
      S4SDWH11.T4S02CTSC_H
Bastion Hosts



▪   Hadoop Cells are isolated, all accesses MUST go through a bastion host
▪   All accesses to the bastion hosts are authenticated via SSH keys
▪   Users log in using their own user
▪   No SSH port forwarding allowed
▪   All shell commands are logged
▪   Batches scheduled on bastion hosts by $Universe (use of ssh-agent)

▪ Bastion hosts can interact with their HDFS cell (hadoop fs commands)
▪ Bastion hosts can launch jobs

▪ Admin tasks, user provisioning done on NameNode

▪ Kerberos Security not used (yet?)
▪ Need for pluggable security mechanism, using SSH signed tokens
Working With Data
We are a Piggy bank ...
                      Attribution: www.seniorliving.org
Why Pig?



▪ We <3 the '1 relation per line' approach, « no SQHell™ »




▪ No metadata service to maintain
▪ Ability to add UDFs
    ▪ A whole lot already added, more on this later...

▪ Batch scheduling
▪ Can handle all the data we store in HDFS

▪ Still open to other tools (Hive, Zohmg, ...)
com.arkea.commons.pig.SequenceFileLoadFunc




▪ Generic load function for our BytesWritable SequenceFiles
▪ Relies on Helper classes to interpret the record bytes
    SequenceFileLoadFunc('HelperClass', 'param', ...)
▪ Helper classes can also be used in regular MapReduce jobs

▪ SequenceFileLoadFunc outputs the following schema

{
    key: bytearray,
    value: bytearray,
    parsed: (
      Helper dependent schema
    )
}
Helper Classes



▪ COBOL – com.arkea.commons.pig.COBOLBinaryHelper
        ▪ COPYBOOK
▪ Thrift – com.arkea.comons.pig.ThriftBinaryHelper
        ▪ .class
▪ DB2 Unload – com.arkea.commons.pig.DB2UnloadBinaryHelper
        ▪ DDL + load script
▪ MySQL – com.arkea.commons.pig.MySQLBinaryHelper
        ▪ DDL
▪ ...
Initial Pig Target




           'proc sql' SAS Corpus
                          from sample to population


Need to give users tools that can reproduce what they did in their scripts
Groovy Closure Pig UDF



DEFINE InlineGroovyUDF cac.pig.udf.GroovyClosure(SCHEMA, CODE);

DEFINE FileGroovyUDF cac.pig.udf.GroovyClosure(SCHEMA, '/path/to/closure.groovy');




SCHEMA uses the standard Pig Schema syntax, i.e. 'str: chararray'

CODE is a short Groovy Closure, i.e. '{ a,b,c -> return a.replaceAll(b,c); }'

closure.groovy must be in a REGISTERed jar under path/to
//
// Import statements
//

import ....;

//
// Constants definitions
//

/**
 * Documentation for XXX
 */
final def XXX = ....;

//
// Closure definition
//

/**
  * Documentation for CLOSURE
  *
  * @param a ...
  * @param b ...
  * @param ...
  *
  * @return ...
  */
final def CLOSURE = {
    a,b,... ->
    ...
    ...
    return ...;
}

//
// Unit Tests
//

// Test specific comment ...
assert CLOSURE('A') == ...;

//
// Return Closure for usage in Pig
//

return CLOSURE;
Pig to Groovy

bag -> java.util.List
tuple -> Object[]
map -> java.util.Map
int -> int
long -> long
float -> float
double -> double
chararray -> java.lang.String
bytearray -> byte[]

Groovy to Pig

groovy.lang.Tuple -> tuple
Object[] -> tuple
java.util.List -> bag
java.util.Map -> map
byte/short/int -> int
long/BigInteger -> long
float -> float
double/BigDecimal -> double
java.lang.String -> chararray
byte[] -> bytearray
Wrap Up
⊕

▪ Fast and rich data pipeline between z/OS and Hadoop

▪ Pig Toolbox to analyze COBOL/DB2 data alongside Thrift/MySQL/xSV/...

▪ Groovy Closure support for rapid extension


▪ Still some missing features
    Pure Java compression codecs (JNI on z/OS anyone?)
    Pig support for BigInteger / BigDecimal (245 might not be enough)
    SSH(RSA) based auth tokens



▪ And yet another hard challenge: Cultural Change
http://www.arkea.com/



      @herberts
Appendix
com.arkea.commons.pig.COBOLBinaryHelper
REGISTER copybook.jar;
A = LOAD '$data' USING cacp.SequenceFileLoadFunc('cacp.COBOLBinaryHelper','[PREFIX:]COPYBOOK');

        000010*GAR* OS Y7XRRDC         DESCRIPTION RRDC NOUVAU FORMAT               30000020
        000020* LG=00328, ESD MAJ LE 04/12/98, ELS MAJ LE 26/01/01 PAR   C98310     30000030
        000030* GENERE LE 26/01/01 A 17H01, PFX : Y7XRRD-     MEMBRE :   Y7XRRDC    30000040
        000040 01         Y7XRRD-Y7XRRDC.                                           30000050
        000050*             DESCRIPTION RRDC NOUVAU FORMAT           1   04/12/98   30000060
        000060   03       Y7XRRD-ARTDS-CLE-SECD.                                    30000070   A: {
        000070*             CLE SECONDAIRE ARCHIVAGE TENU DE SOLDE   1   11/02/98   30000080     key: bytearray,
        000080     05     Y7XRRD-NO-CCM       PIC X(4).                             30000090
        000090*             NUMERO CAISSE                            1   28/12/94   30000100
                                                                                                 value: bytearray,
        000100     05     Y7XRRD-NO-PSE       PIC X(8).                             30000110     parsed: (
        000110*             NUMERO PERSONNE                          5   10/07/97   30000120        Y7XRRD_Y7XRRDC: bytearray,
        000120     05     Y7XRRD-CATEGORIE    PIC X(2).                             30000130
        000130*             CATéGORIE DU COMPTE                     13   09/01/01   30000140        Y7XRRD_ARTDS_CLE_SECD: bytearray,
        000140     05     Y7XRRD-RANG         PIC X(2).                             30000150        Y7XRRD_NO_CCM: chararray,
        010010*             RANG                                    15   22/01/01   30000160
        010020     05     Y7XRRD-NO-ORDRE     PIC X(2).                             30000170        Y7XRRD_NO_PSE: chararray,
        010030*             Numéro d'ordre                          17   28/12/94   30000180        Y7XRRD_CATEGORIE: chararray,
        010040     05     Y7XRRD-DA-TT-C2     PIC X(8).                             30000190
        010050*             DATE TRAITEMENT                 SX:-C2 19     -   -     30000200        Y7XRRD_RANG: chararray,
        010060     05     Y7XRRD-NO-ORDRE-ENR-C2 PIC 9(6).                          30000210        Y7XRRD_NO_ORDRE: chararray,
        010070*             Numéro d'ordre enregistrement   SX:-C2 27     -   -     30000220
        010080   03       Y7XRRD-MT-OPE-TDS   PIC S9(13)V9(2) COMP-3.               30000230
                                                                                                    Y7XRRD_DA_TT_C2: chararray,
        010090*             MONTANT OPERATION TENUE-DE-SOLDE        33   03/02/98   30000240        Y7XRRD_NO_ORDRE_ENR_C2: long,
        010100   03       Y7XRRD-CD-DVS-ORI-OPE PIC X(4).                           30000250
        010110*             CODE DEVISE ORIGINE OPERATION           41    -   -     30000260
                                                                                                    Y7XRRD_MT_OPE_TDS: double,
        010120   03       Y7XRRD-CD-DVS-GTN-TDS PIC X(4).                           30000270        Y7XRRD_CD_DVS_ORI_OPE: chararray,
        010130*             CODE DEVISE GESTION TENUE-DE-SOLDE      45    -   -     30000280        Y7XRRD_CD_DVS_GTN_TDS: chararray,
        010140   03       Y7XRRD-MT-CNVS-OPE PIC S9(13)V9(2) COMP-3.                30000290
        020010*             MONTANT CONVERTI OPERATION              49    -   -     30000300        Y7XRRD_MT_CNVS_OPE: double,
        020020   03       Y7XRRD-IDC-ATN-ORI-MT PIC X(1).                           30000310        Y7XRRD_IDC_ATN_ORI_MT: chararray,
        020030*             INDICATEUR AUTHENTICITE ORIGINE MONTAN 57    05/12/97   30000320
        020040   03       Y7XRRD-SLD-AV-IMPT PIC S9(13)V9(2) COMP-3.                30000330        Y7XRRD_SLD_AV_IMPT: double,
        020050*             SOLDE AVANT IMPUTATION                  58   03/02/98   30000340        Y7XRRD_DA_OPE_TDS: chararray,
        020060   03       Y7XRRD-DA-OPE-TDS   PIC X(8).                             30000350
        020070*             DATE OPERATION TENUE-DE-SOLDE           66    -   -     30000360        Y7XRRD_DA_VLR: chararray,
        020080   03       Y7XRRD-DA-VLR       PIC X(8).                             30000370        Y7XRRD_DA_ARR: chararray,
        020090*             DATE VALEUR                             74   28/12/94   30000380
        020100   03       Y7XRRD-DA-ARR       PIC X(8).                             30000390
                                                                                                    Y7XRRD_NO_STR_OPE: chararray,
        020110*             DATE ARRETE                             82    -   -     30000400        Y7XRRD_NO_REF_TNL_MED: chararray,
        020120   03       Y7XRRD-NO-STR-OPE   PIC X(6).                             30000410        Y7XRRD_NO_LOT: chararray,
        020130*             NUMERO STRUCTURE OPERATIONNELLE         90    -   -     30000420
        020140   03       Y7XRRD-NO-REF-TNL-MED PIC X(4).                           30000430        Y7XRRD_TDS_LIBELLES: bytearray,
        030010*             NUMERO REFERENCE TERMINAL MEDIA         96   03/02/98   30000440        Y7XRRD_LIB_CLI_OPE_1: chararray,
        030020   03       Y7XRRD-NO-LOT       PIC X(3).                             30000450
        030030*             NUMéRO DE LOT                          100   13/10/97   30000460        Y7XRRD_LIB_ITE_OPE: chararray,
        030040   03       Y7XRRD-TDS-LIBELLES.                                      30000470        Y7XRRD_LIB_CT_CLI: chararray,
        030050*             FAMILLE MONTANTS OPERATION T.DE.SOLDE 103    05/02/98   30000480
        030060     05     Y7XRRD-LIB-CLI-OPE-1 PIC X(50).                           30000490        Y7XRRD_CD_UTI_LIB_CPL: chararray,
        030070*             LIBELLE CLIENT OPERATION        SX:-1 103    03/02/98   30000500        Y7XRRD_IDC_COM_OPE: chararray,
        030080     05     Y7XRRD-LIB-ITE-OPE PIC X(32).                             30000510
        030090*             LIBELLE INTERNE OPERATION              153    -   -     30000520
                                                                                                    Y7XRRD_CD_TY_OPE_NIV_1: chararray,
        030100     05     Y7XRRD-LIB-CT-CLI   PIC X(32).                            30000530        Y7XRRD_CD_TY_OPE_NIV_2: chararray,
        030110*             LIBELLE COURT CLIENT                   185    -   -     30000540        FILLER: chararray,
        030120   03       Y7XRRD-CD-UTI-LIB-CPL PIC X(1).                           30000550
        030130*             Code utilisation libellés compl.       217   28/12/94   30000560        Y7XRRD_TDS_LIB_SUPPL: bytearray,
        030140   03       Y7XRRD-IDC-COM-OPE PIC X(1).                              30000570        Y7XRRD_LIB_CLI_OPE_02: chararray,
        040010*             INDICATEUR COMMISSION OPERATION        218   03/02/98   30000580
        040020   03       Y7XRRD-CD-TY-OPE-NIV-1 PIC X(1).                          30000590        Y7XRRD_LIB_CLI_OPE_03: chararray
        040030*             CODE TYPE OPERATION NIVEAU UN          219    -   -     30000600     )
        040040   03       Y7XRRD-CD-TY-OPE-NIV-2 PIC X(2).                          30000610
        040050*             CODE TYPE OPERATION NIVEAU DEUX        220    -   -     30000620   }
        040060   03       FILLER              PIC X(7).                             30000630
        040070*                                                    222              30000640
        040080   03       Y7XRRD-TDS-LIB-SUPPL.                                     30000650
        040090*             FAMILLE LIBELLES COMPLEMENTAIRES T.D.S 229   17/02/98   30000660
        040100     05     Y7XRRD-LIB-CLI-OPE-02 PIC X(50).                          30000670
        040110*             LIBELLE CLIENT OPERATION        SX:-02 229   03/02/98   30000680
        040120     05     Y7XRRD-LIB-CLI-OPE-03 PIC X(50).                          30000690
        040130*             LIBELLE CLIENT OPERATION        SX:-03 279    -   -     30000700
com.arkea.commons.pig.DB2UnloadBinaryHelper
 REGISTER ddl-load.jar;
 A = LOAD '$data' USING cacp.SequenceFileLoadFunc('cacp.DB2UnloadBinaryHelper','[PREFIX:]TABLE');



        CREATE TABLE SHDBA.TBDCOLS
        (COL_CHAR CHAR(4) FOR SBCS DATA WITH DEFAULT NULL,
        COL_DECIMAL DECIMAL(15, 2) WITH DEFAULT NULL,
        COL_NUMERIC DECIMAL(15, 0) WITH DEFAULT NULL,
.ddl    COL_SMALLINT SMALLINT WITH DEFAULT NULL,
        COL_INTEGER INTEGER WITH DEFAULT NULL,                                          A: {
        COL_VARCHAR VARCHAR(50) FOR SBCS DATA WITH DEFAULT NULL,                          key: bytearray,
        COL_DATE DATE WITH DEFAULT NULL,                                                  value: bytearray,
        COL_TIME TIME WITH DEFAULT NULL,                                                  parsed: (
        COL_TIMESTAMP TIMESTAMP WITH DEFAULT NULL) ;                                         COL_CHAR: chararray,
                                                                                             COL_DECIMAL: double,
                                                                                             COL_NUMERIC: long,
                                                                                             COL_SMALLINT: long,
        TEMPLATE DFEM8ERT
                                                                                             COL_INTEGER: long,
        DSN('XXXXX.PPSDR.B99BD02.SBDCOLS.REC')
                                                                                             COL_VARCHAR: chararray,
        DISP(OLD,KEEP,KEEP)
                                                                                             COL_DATE: chararray,
        LOAD DATA INDDN DFEM8ERT LOG NO RESUME YES
                                                                                             COL_TIME: chararray,
        EBCDIC CCSID(01147,00000,00000)
                                                                                             COL_TIMESTAMP:
        INTO TABLE "SHDBA"."TBDCOLS"
                                                                                        chararray
        WHEN(00001:00002) = X'003F'
                                                                                          )
.load   ( "COL_CHAR" POSITION( 00004:00007) CHAR(00004) NULLIF(00003)=X'FF',
                                                                                        }
        "COL_DECIMAL" POSITION( 00009:00016) DECIMAL NULLIF(00008)=X'FF',
        "COL_NUMERIC" POSITION( 00018:00025) DECIMAL NULLIF(00017)=X'FF',
        "COL_SMALLINT" POSITION( 00027:00028) SMALLINT NULLIF(00026)=X'FF',
        "COL_INTEGER" POSITION( 00030:00033) INTEGER NULLIF(00029)=X'FF',
        "COL_VARCHAR" POSITION( 00035:00086) VARCHAR NULLIF(00034)=X'FF',
        "COL_DATE" POSITION( 00088:00097) DATE EXTERNAL NULLIF(00087)=X'FF',
        "COL_TIME" POSITION( 00099:00106) TIME EXTERNAL NULLIF(00098)=X'FF',
        "COL_TIMESTAMP" POSITION( 00108:00133) TIMESTAMP EXTERNAL NULLIF(00107)=X'FF'
        )



Can also handle DB2 UDB unloads (done using hpu)
we're Thrifty too...
   REGISTER thrift-generated.jar;
   A = LOAD '$data' USING cacp.SequenceFileLoadFunc('cacp.ThiftBinaryHelper','CLASS');



                       struct Redirection{                    A: {
                         1: string alias,                       key: bytearray,
                         2: string url,                         value: bytearray,
                         3: string email,                       parsed: (
                         4: i64 timestamp,                         alias: chararray,
                         5: i64 lastupdate,                        url: chararray,
                         6: list<string> params,                   email: chararray,
                         7: bool external = 1,                     timestamp: long,
                         8: i64 owner,                             lastupdate: long,
                         9: string user,                           params: (),
                       }                                           external: long,
                                                                   owner: long,
                                                                   user: chararray
                                                                )
                                                              }




... and also use MySQL ...

   REGISTER mysql-ddl.jar;
   A = LOAD '$data' USING cacp.SequenceFileLoadFunc('cacp.MySQLBinaryHelper','TABLE');



... etc etc etc ...

Weitere ähnliche Inhalte

Was ist angesagt?

RSA NetWitness Log Decoder
RSA NetWitness Log DecoderRSA NetWitness Log Decoder
RSA NetWitness Log DecoderSusam Pal
 
Ethereum virtual machine for Developers Part 1
Ethereum virtual machine for Developers Part 1Ethereum virtual machine for Developers Part 1
Ethereum virtual machine for Developers Part 1ArcBlock
 
Using NoSQL databases to store RADIUS and Syslog data
Using NoSQL databases to store RADIUS and Syslog dataUsing NoSQL databases to store RADIUS and Syslog data
Using NoSQL databases to store RADIUS and Syslog dataKarri Huhtanen
 
Apache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLab
Apache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLabApache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLab
Apache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
 
Varnish @ Velocity Ignite
Varnish @ Velocity IgniteVarnish @ Velocity Ignite
Varnish @ Velocity IgniteArtur Bergman
 
Indexes From the Concept to Internals
Indexes From the Concept to InternalsIndexes From the Concept to Internals
Indexes From the Concept to InternalsDeiby Gómez
 
The Ring programming language version 1.5.2 book - Part 78 of 181
The Ring programming language version 1.5.2 book - Part 78 of 181The Ring programming language version 1.5.2 book - Part 78 of 181
The Ring programming language version 1.5.2 book - Part 78 of 181Mahmoud Samir Fayed
 
GPU Programming on CPU - Using C++AMP
GPU Programming on CPU - Using C++AMPGPU Programming on CPU - Using C++AMP
GPU Programming on CPU - Using C++AMPMiller Lee
 
How to write rust instead of c and get away with it
How to write rust instead of c and get away with itHow to write rust instead of c and get away with it
How to write rust instead of c and get away with itFlavien Raynaud
 
[DSC] Introduction to Binary Exploitation
[DSC] Introduction to Binary Exploitation[DSC] Introduction to Binary Exploitation
[DSC] Introduction to Binary ExploitationFlorian Müller
 
Introduction to CUDA C: NVIDIA : Notes
Introduction to CUDA C: NVIDIA : NotesIntroduction to CUDA C: NVIDIA : Notes
Introduction to CUDA C: NVIDIA : NotesSubhajit Sahu
 
Taipei.py 2018 - Control device via ioctl from Python
Taipei.py 2018 - Control device via ioctl from Python Taipei.py 2018 - Control device via ioctl from Python
Taipei.py 2018 - Control device via ioctl from Python Hua Chu
 
What is row level isolation on cassandra
What is row level isolation on cassandraWhat is row level isolation on cassandra
What is row level isolation on cassandraKazutaka Tomita
 
W8_2: Inside the UoS Educational Processor
W8_2: Inside the UoS Educational ProcessorW8_2: Inside the UoS Educational Processor
W8_2: Inside the UoS Educational ProcessorDaniel Roggen
 

Was ist angesagt? (17)

Unix v6 セミナー vol. 5
Unix v6 セミナー vol. 5Unix v6 セミナー vol. 5
Unix v6 セミナー vol. 5
 
RSA NetWitness Log Decoder
RSA NetWitness Log DecoderRSA NetWitness Log Decoder
RSA NetWitness Log Decoder
 
Ethereum virtual machine for Developers Part 1
Ethereum virtual machine for Developers Part 1Ethereum virtual machine for Developers Part 1
Ethereum virtual machine for Developers Part 1
 
Hack reduce mr-intro
Hack reduce mr-introHack reduce mr-intro
Hack reduce mr-intro
 
Using NoSQL databases to store RADIUS and Syslog data
Using NoSQL databases to store RADIUS and Syslog dataUsing NoSQL databases to store RADIUS and Syslog data
Using NoSQL databases to store RADIUS and Syslog data
 
Apache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLab
Apache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLabApache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLab
Apache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLab
 
Varnish @ Velocity Ignite
Varnish @ Velocity IgniteVarnish @ Velocity Ignite
Varnish @ Velocity Ignite
 
Indexes From the Concept to Internals
Indexes From the Concept to InternalsIndexes From the Concept to Internals
Indexes From the Concept to Internals
 
The Ring programming language version 1.5.2 book - Part 78 of 181
The Ring programming language version 1.5.2 book - Part 78 of 181The Ring programming language version 1.5.2 book - Part 78 of 181
The Ring programming language version 1.5.2 book - Part 78 of 181
 
GPU Programming on CPU - Using C++AMP
GPU Programming on CPU - Using C++AMPGPU Programming on CPU - Using C++AMP
GPU Programming on CPU - Using C++AMP
 
How to write rust instead of c and get away with it
How to write rust instead of c and get away with itHow to write rust instead of c and get away with it
How to write rust instead of c and get away with it
 
[DSC] Introduction to Binary Exploitation
[DSC] Introduction to Binary Exploitation[DSC] Introduction to Binary Exploitation
[DSC] Introduction to Binary Exploitation
 
Introduction to CUDA C: NVIDIA : Notes
Introduction to CUDA C: NVIDIA : NotesIntroduction to CUDA C: NVIDIA : Notes
Introduction to CUDA C: NVIDIA : Notes
 
Redo internals ppt
Redo internals pptRedo internals ppt
Redo internals ppt
 
Taipei.py 2018 - Control device via ioctl from Python
Taipei.py 2018 - Control device via ioctl from Python Taipei.py 2018 - Control device via ioctl from Python
Taipei.py 2018 - Control device via ioctl from Python
 
What is row level isolation on cassandra
What is row level isolation on cassandraWhat is row level isolation on cassandra
What is row level isolation on cassandra
 
W8_2: Inside the UoS Educational Processor
W8_2: Inside the UoS Educational ProcessorW8_2: Inside the UoS Educational Processor
W8_2: Inside the UoS Educational Processor
 

Ähnlich wie Hadoop World 2011: Leveraging Hadoop for Legacy Systems - Mathias Herberts, Credit Mutuel Arkea

Examining Oracle GoldenGate Trail Files
Examining Oracle GoldenGate Trail FilesExamining Oracle GoldenGate Trail Files
Examining Oracle GoldenGate Trail FilesBobby Curtis
 
DUG'20: 12 - DAOS in Lenovo’s HPC Innovation Center
DUG'20: 12 - DAOS in Lenovo’s HPC Innovation CenterDUG'20: 12 - DAOS in Lenovo’s HPC Innovation Center
DUG'20: 12 - DAOS in Lenovo’s HPC Innovation CenterAndrey Kudryavtsev
 
Vectorization on x86: all you need to know
Vectorization on x86: all you need to knowVectorization on x86: all you need to know
Vectorization on x86: all you need to knowRoberto Agostino Vitillo
 
11thingsabout11g 12659705398222 Phpapp01
11thingsabout11g 12659705398222 Phpapp0111thingsabout11g 12659705398222 Phpapp01
11thingsabout11g 12659705398222 Phpapp01Karam Abuataya
 
11 Things About11g
11 Things About11g11 Things About11g
11 Things About11gfcamachob
 
Hopping in clouds - phpuk 17
Hopping in clouds - phpuk 17Hopping in clouds - phpuk 17
Hopping in clouds - phpuk 17Michele Orselli
 
Cryptography and secure systems
Cryptography and secure systemsCryptography and secure systems
Cryptography and secure systemsVsevolod Stakhov
 
Direct SGA access without SQL
Direct SGA access without SQLDirect SGA access without SQL
Direct SGA access without SQLKyle Hailey
 
InfluxDB IOx Tech Talks: A Rusty Introduction to Apache Arrow and How it App...
InfluxDB IOx Tech Talks:  A Rusty Introduction to Apache Arrow and How it App...InfluxDB IOx Tech Talks:  A Rusty Introduction to Apache Arrow and How it App...
InfluxDB IOx Tech Talks: A Rusty Introduction to Apache Arrow and How it App...InfluxData
 
Gaztea Tech Robotica 2016
Gaztea Tech Robotica 2016Gaztea Tech Robotica 2016
Gaztea Tech Robotica 2016Svet Ivantchev
 
Aws Quick Dirty Hadoop Mapreduce Ec2 S3
Aws Quick Dirty Hadoop Mapreduce Ec2 S3Aws Quick Dirty Hadoop Mapreduce Ec2 S3
Aws Quick Dirty Hadoop Mapreduce Ec2 S3Skills Matter
 
Spectre(v1%2 fv2%2fv4) v.s. meltdown(v3)
Spectre(v1%2 fv2%2fv4) v.s. meltdown(v3)Spectre(v1%2 fv2%2fv4) v.s. meltdown(v3)
Spectre(v1%2 fv2%2fv4) v.s. meltdown(v3)Gavin Guo
 
Linux-wpan: IEEE 802.15.4 and 6LoWPAN in the Linux Kernel - BUD17-120
Linux-wpan: IEEE 802.15.4 and 6LoWPAN in the Linux Kernel - BUD17-120Linux-wpan: IEEE 802.15.4 and 6LoWPAN in the Linux Kernel - BUD17-120
Linux-wpan: IEEE 802.15.4 and 6LoWPAN in the Linux Kernel - BUD17-120Linaro
 
OpenWorld Sep14 12c for_developers
OpenWorld Sep14 12c for_developersOpenWorld Sep14 12c for_developers
OpenWorld Sep14 12c for_developersConnor McDonald
 
Designing High Performance RTC Signaling Servers
Designing High Performance RTC Signaling ServersDesigning High Performance RTC Signaling Servers
Designing High Performance RTC Signaling ServersDaniel-Constantin Mierla
 
REST made simple with Java
REST made simple with JavaREST made simple with Java
REST made simple with Javaelliando dias
 
CONFidence 2017: Hacking embedded with OpenWrt (Vladimir Mitiouchev)
CONFidence 2017: Hacking embedded with OpenWrt (Vladimir Mitiouchev)CONFidence 2017: Hacking embedded with OpenWrt (Vladimir Mitiouchev)
CONFidence 2017: Hacking embedded with OpenWrt (Vladimir Mitiouchev)PROIDEA
 

Ähnlich wie Hadoop World 2011: Leveraging Hadoop for Legacy Systems - Mathias Herberts, Credit Mutuel Arkea (20)

Examining Oracle GoldenGate Trail Files
Examining Oracle GoldenGate Trail FilesExamining Oracle GoldenGate Trail Files
Examining Oracle GoldenGate Trail Files
 
dotCloud and go
dotCloud and godotCloud and go
dotCloud and go
 
DUG'20: 12 - DAOS in Lenovo’s HPC Innovation Center
DUG'20: 12 - DAOS in Lenovo’s HPC Innovation CenterDUG'20: 12 - DAOS in Lenovo’s HPC Innovation Center
DUG'20: 12 - DAOS in Lenovo’s HPC Innovation Center
 
Dpdk applications
Dpdk applicationsDpdk applications
Dpdk applications
 
Vectorization on x86: all you need to know
Vectorization on x86: all you need to knowVectorization on x86: all you need to know
Vectorization on x86: all you need to know
 
11thingsabout11g 12659705398222 Phpapp01
11thingsabout11g 12659705398222 Phpapp0111thingsabout11g 12659705398222 Phpapp01
11thingsabout11g 12659705398222 Phpapp01
 
11 Things About11g
11 Things About11g11 Things About11g
11 Things About11g
 
Hopping in clouds - phpuk 17
Hopping in clouds - phpuk 17Hopping in clouds - phpuk 17
Hopping in clouds - phpuk 17
 
Quic illustrated
Quic illustratedQuic illustrated
Quic illustrated
 
Cryptography and secure systems
Cryptography and secure systemsCryptography and secure systems
Cryptography and secure systems
 
Direct SGA access without SQL
Direct SGA access without SQLDirect SGA access without SQL
Direct SGA access without SQL
 
InfluxDB IOx Tech Talks: A Rusty Introduction to Apache Arrow and How it App...
InfluxDB IOx Tech Talks:  A Rusty Introduction to Apache Arrow and How it App...InfluxDB IOx Tech Talks:  A Rusty Introduction to Apache Arrow and How it App...
InfluxDB IOx Tech Talks: A Rusty Introduction to Apache Arrow and How it App...
 
Gaztea Tech Robotica 2016
Gaztea Tech Robotica 2016Gaztea Tech Robotica 2016
Gaztea Tech Robotica 2016
 
Aws Quick Dirty Hadoop Mapreduce Ec2 S3
Aws Quick Dirty Hadoop Mapreduce Ec2 S3Aws Quick Dirty Hadoop Mapreduce Ec2 S3
Aws Quick Dirty Hadoop Mapreduce Ec2 S3
 
Spectre(v1%2 fv2%2fv4) v.s. meltdown(v3)
Spectre(v1%2 fv2%2fv4) v.s. meltdown(v3)Spectre(v1%2 fv2%2fv4) v.s. meltdown(v3)
Spectre(v1%2 fv2%2fv4) v.s. meltdown(v3)
 
Linux-wpan: IEEE 802.15.4 and 6LoWPAN in the Linux Kernel - BUD17-120
Linux-wpan: IEEE 802.15.4 and 6LoWPAN in the Linux Kernel - BUD17-120Linux-wpan: IEEE 802.15.4 and 6LoWPAN in the Linux Kernel - BUD17-120
Linux-wpan: IEEE 802.15.4 and 6LoWPAN in the Linux Kernel - BUD17-120
 
OpenWorld Sep14 12c for_developers
OpenWorld Sep14 12c for_developersOpenWorld Sep14 12c for_developers
OpenWorld Sep14 12c for_developers
 
Designing High Performance RTC Signaling Servers
Designing High Performance RTC Signaling ServersDesigning High Performance RTC Signaling Servers
Designing High Performance RTC Signaling Servers
 
REST made simple with Java
REST made simple with JavaREST made simple with Java
REST made simple with Java
 
CONFidence 2017: Hacking embedded with OpenWrt (Vladimir Mitiouchev)
CONFidence 2017: Hacking embedded with OpenWrt (Vladimir Mitiouchev)CONFidence 2017: Hacking embedded with OpenWrt (Vladimir Mitiouchev)
CONFidence 2017: Hacking embedded with OpenWrt (Vladimir Mitiouchev)
 

Mehr von Cloudera, Inc.

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxCloudera, Inc.
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera, Inc.
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards FinalistsCloudera, Inc.
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Cloudera, Inc.
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Cloudera, Inc.
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Cloudera, Inc.
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Cloudera, Inc.
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Cloudera, Inc.
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Cloudera, Inc.
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Cloudera, Inc.
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Cloudera, Inc.
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Cloudera, Inc.
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformCloudera, Inc.
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Cloudera, Inc.
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Cloudera, Inc.
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Cloudera, Inc.
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Cloudera, Inc.
 

Mehr von Cloudera, Inc. (20)

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 

Kürzlich hochgeladen

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 

Kürzlich hochgeladen (20)

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 

Hadoop World 2011: Leveraging Hadoop for Legacy Systems - Mathias Herberts, Credit Mutuel Arkea

  • 1. Leveraging Hadoop for Legacy Systems Mathias Herberts - @herberts
  • 2. Crédit Mutuel Arkéa key Facts & Figures (as of 2011-06-30)
  • 3. A Regional Bank with a National Network
  • 5. Why Hadoop? ▪ Ever increasing volume of data ▪ Very regulated sector (Basel II/III, Solvency II) ▪ Need to produce compliance reports ▪ Competitive sector ▪ Need to create value, data identified as a great source of it ▪ Keep costs under control ▪ Fond of Open Source ▪ Engineers like big challenges
  • 7.
  • 9. Types of logical storage Virtual Storage Access Method Record-oriented (fixed or variable length) indexed datasets Physically Sequential Record-oriented (fixed or variable length) datasets, not indexed Can exist on different types of media IBM DB2 Relational Model Database Server
  • 10. Types of binary records stored COBOL Records (conform to a COPYBOOK) DB2 'UNLOAD' Records (conform to a DDL statement)
  • 11. Types of data stored in HDFS {Tab, Comma, ...} Separated Values One line records of multiple columns Text Line-oriented (eg logs) Hadoop SequenceFiles Block compressed ▪ Mostly BytesWritable key/value ▪ COBOL records ▪ DB2 unloaded records ▪ Serialized Thrift structures ▪ Use of DefaultCodec (pure Java)
  • 13. Standard data transfer process ▪ On the fly charset conversion ▪ Loss of notion of records
  • 14. Hadoop data transfer process ▪ On the fly compression ▪ Keep original charset ▪ Preserved notion of records
  • 15. Staging Server ▪ Gateway In & Out of an HDFS Cell ▪ Reads/Writes to /hdfs/staging/{in,out}/... (runs as hdfs) ▪ HTTP Based (POST/GET) ▪ Upload to http://hadoop-staging/put[/hdfs/staging/in/...] Stores directly in HDFS, no intermediary storage Multiple files support Random target directory created if none specified Parameters user, group, perm, suffix curl -F "file=@local;filename=remote" http://../put?user=foo&group=bar&perm=644&suffix=.test ▪ Download from http://hadoop-staging/get/hdfs/staging/out/... Ability to unpack SequenceFile records (unpack={base64,hex}) as key:value lines
  • 16. fileutil ▪ Swiss Army Knife for SequenceFiles, HDFS Staging Server, ZooKeeper ▪ Written in Java, single jar ▪ Works in all our environments (z/OS, Unix, Windows, ...) ▪ Can be ran using TWS/OPC on z/OS (via a JCL), $Universe on Unix, cron ... ▪ Multiple commands sfstage Convert a z/OS dataset to a SF and push it to the staging server {stream,file}stage Push a stream or files to the staging server filesfstage Convert a file to a SF (one record per block) and stage it sfpack Pack key:value lines (cf unpack) in a SequenceFile sfarchive Create a SequenceFile, one record per input file zk{ls,lsr,cat,stat}Read data from ZooKeeper get Retrieve data via URI ...
  • 18. Data Organization ▪ Use of a directory structure that mimics the datasets names PR0172.PVS00.F7209588 Environment / Silo / Application /hdfs/data/si/PR/01/72/PR0172.PVS00.F7209588.SUFFIX ▪ Group ACLs at the Environment/Silo/Application levels ▪ Suffix is mainly used to add .YYYYMM to Generation Data Groups ▪ Suffix added by the staging server ▪ DB2 Table unloads follow similar rules P11DBA.T90XXXX S4SDWH11.T4S02CTSC_H
  • 19. Bastion Hosts ▪ Hadoop Cells are isolated, all accesses MUST go through a bastion host ▪ All accesses to the bastion hosts are authenticated via SSH keys ▪ Users log in using their own user ▪ No SSH port forwarding allowed ▪ All shell commands are logged ▪ Batches scheduled on bastion hosts by $Universe (use of ssh-agent) ▪ Bastion hosts can interact with their HDFS cell (hadoop fs commands) ▪ Bastion hosts can launch jobs ▪ Admin tasks, user provisioning done on NameNode ▪ Kerberos Security not used (yet?) ▪ Need for pluggable security mechanism, using SSH signed tokens
  • 21. We are a Piggy bank ... Attribution: www.seniorliving.org
  • 22. Why Pig? ▪ We <3 the '1 relation per line' approach, « no SQHell™ » ▪ No metadata service to maintain ▪ Ability to add UDFs ▪ A whole lot already added, more on this later... ▪ Batch scheduling ▪ Can handle all the data we store in HDFS ▪ Still open to other tools (Hive, Zohmg, ...)
  • 23. com.arkea.commons.pig.SequenceFileLoadFunc ▪ Generic load function for our BytesWritable SequenceFiles ▪ Relies on Helper classes to interpret the record bytes SequenceFileLoadFunc('HelperClass', 'param', ...) ▪ Helper classes can also be used in regular MapReduce jobs ▪ SequenceFileLoadFunc outputs the following schema { key: bytearray, value: bytearray, parsed: ( Helper dependent schema ) }
  • 24. Helper Classes ▪ COBOL – com.arkea.commons.pig.COBOLBinaryHelper ▪ COPYBOOK ▪ Thrift – com.arkea.comons.pig.ThriftBinaryHelper ▪ .class ▪ DB2 Unload – com.arkea.commons.pig.DB2UnloadBinaryHelper ▪ DDL + load script ▪ MySQL – com.arkea.commons.pig.MySQLBinaryHelper ▪ DDL ▪ ...
  • 25. Initial Pig Target 'proc sql' SAS Corpus from sample to population Need to give users tools that can reproduce what they did in their scripts
  • 26. Groovy Closure Pig UDF DEFINE InlineGroovyUDF cac.pig.udf.GroovyClosure(SCHEMA, CODE); DEFINE FileGroovyUDF cac.pig.udf.GroovyClosure(SCHEMA, '/path/to/closure.groovy'); SCHEMA uses the standard Pig Schema syntax, i.e. 'str: chararray' CODE is a short Groovy Closure, i.e. '{ a,b,c -> return a.replaceAll(b,c); }' closure.groovy must be in a REGISTERed jar under path/to
  • 27. // // Import statements // import ....; // // Constants definitions // /** * Documentation for XXX */ final def XXX = ....; // // Closure definition // /** * Documentation for CLOSURE * * @param a ... * @param b ... * @param ... * * @return ... */ final def CLOSURE = { a,b,... -> ... ... return ...; } // // Unit Tests // // Test specific comment ... assert CLOSURE('A') == ...; // // Return Closure for usage in Pig // return CLOSURE;
  • 28. Pig to Groovy bag -> java.util.List tuple -> Object[] map -> java.util.Map int -> int long -> long float -> float double -> double chararray -> java.lang.String bytearray -> byte[] Groovy to Pig groovy.lang.Tuple -> tuple Object[] -> tuple java.util.List -> bag java.util.Map -> map byte/short/int -> int long/BigInteger -> long float -> float double/BigDecimal -> double java.lang.String -> chararray byte[] -> bytearray
  • 30. ⊕ ▪ Fast and rich data pipeline between z/OS and Hadoop ▪ Pig Toolbox to analyze COBOL/DB2 data alongside Thrift/MySQL/xSV/... ▪ Groovy Closure support for rapid extension ▪ Still some missing features Pure Java compression codecs (JNI on z/OS anyone?) Pig support for BigInteger / BigDecimal (245 might not be enough) SSH(RSA) based auth tokens ▪ And yet another hard challenge: Cultural Change
  • 33. com.arkea.commons.pig.COBOLBinaryHelper REGISTER copybook.jar; A = LOAD '$data' USING cacp.SequenceFileLoadFunc('cacp.COBOLBinaryHelper','[PREFIX:]COPYBOOK'); 000010*GAR* OS Y7XRRDC DESCRIPTION RRDC NOUVAU FORMAT 30000020 000020* LG=00328, ESD MAJ LE 04/12/98, ELS MAJ LE 26/01/01 PAR C98310 30000030 000030* GENERE LE 26/01/01 A 17H01, PFX : Y7XRRD- MEMBRE : Y7XRRDC 30000040 000040 01 Y7XRRD-Y7XRRDC. 30000050 000050* DESCRIPTION RRDC NOUVAU FORMAT 1 04/12/98 30000060 000060 03 Y7XRRD-ARTDS-CLE-SECD. 30000070 A: { 000070* CLE SECONDAIRE ARCHIVAGE TENU DE SOLDE 1 11/02/98 30000080 key: bytearray, 000080 05 Y7XRRD-NO-CCM PIC X(4). 30000090 000090* NUMERO CAISSE 1 28/12/94 30000100 value: bytearray, 000100 05 Y7XRRD-NO-PSE PIC X(8). 30000110 parsed: ( 000110* NUMERO PERSONNE 5 10/07/97 30000120 Y7XRRD_Y7XRRDC: bytearray, 000120 05 Y7XRRD-CATEGORIE PIC X(2). 30000130 000130* CATéGORIE DU COMPTE 13 09/01/01 30000140 Y7XRRD_ARTDS_CLE_SECD: bytearray, 000140 05 Y7XRRD-RANG PIC X(2). 30000150 Y7XRRD_NO_CCM: chararray, 010010* RANG 15 22/01/01 30000160 010020 05 Y7XRRD-NO-ORDRE PIC X(2). 30000170 Y7XRRD_NO_PSE: chararray, 010030* Numéro d'ordre 17 28/12/94 30000180 Y7XRRD_CATEGORIE: chararray, 010040 05 Y7XRRD-DA-TT-C2 PIC X(8). 30000190 010050* DATE TRAITEMENT SX:-C2 19 - - 30000200 Y7XRRD_RANG: chararray, 010060 05 Y7XRRD-NO-ORDRE-ENR-C2 PIC 9(6). 30000210 Y7XRRD_NO_ORDRE: chararray, 010070* Numéro d'ordre enregistrement SX:-C2 27 - - 30000220 010080 03 Y7XRRD-MT-OPE-TDS PIC S9(13)V9(2) COMP-3. 30000230 Y7XRRD_DA_TT_C2: chararray, 010090* MONTANT OPERATION TENUE-DE-SOLDE 33 03/02/98 30000240 Y7XRRD_NO_ORDRE_ENR_C2: long, 010100 03 Y7XRRD-CD-DVS-ORI-OPE PIC X(4). 30000250 010110* CODE DEVISE ORIGINE OPERATION 41 - - 30000260 Y7XRRD_MT_OPE_TDS: double, 010120 03 Y7XRRD-CD-DVS-GTN-TDS PIC X(4). 30000270 Y7XRRD_CD_DVS_ORI_OPE: chararray, 010130* CODE DEVISE GESTION TENUE-DE-SOLDE 45 - - 30000280 Y7XRRD_CD_DVS_GTN_TDS: chararray, 010140 03 Y7XRRD-MT-CNVS-OPE PIC S9(13)V9(2) COMP-3. 30000290 020010* MONTANT CONVERTI OPERATION 49 - - 30000300 Y7XRRD_MT_CNVS_OPE: double, 020020 03 Y7XRRD-IDC-ATN-ORI-MT PIC X(1). 30000310 Y7XRRD_IDC_ATN_ORI_MT: chararray, 020030* INDICATEUR AUTHENTICITE ORIGINE MONTAN 57 05/12/97 30000320 020040 03 Y7XRRD-SLD-AV-IMPT PIC S9(13)V9(2) COMP-3. 30000330 Y7XRRD_SLD_AV_IMPT: double, 020050* SOLDE AVANT IMPUTATION 58 03/02/98 30000340 Y7XRRD_DA_OPE_TDS: chararray, 020060 03 Y7XRRD-DA-OPE-TDS PIC X(8). 30000350 020070* DATE OPERATION TENUE-DE-SOLDE 66 - - 30000360 Y7XRRD_DA_VLR: chararray, 020080 03 Y7XRRD-DA-VLR PIC X(8). 30000370 Y7XRRD_DA_ARR: chararray, 020090* DATE VALEUR 74 28/12/94 30000380 020100 03 Y7XRRD-DA-ARR PIC X(8). 30000390 Y7XRRD_NO_STR_OPE: chararray, 020110* DATE ARRETE 82 - - 30000400 Y7XRRD_NO_REF_TNL_MED: chararray, 020120 03 Y7XRRD-NO-STR-OPE PIC X(6). 30000410 Y7XRRD_NO_LOT: chararray, 020130* NUMERO STRUCTURE OPERATIONNELLE 90 - - 30000420 020140 03 Y7XRRD-NO-REF-TNL-MED PIC X(4). 30000430 Y7XRRD_TDS_LIBELLES: bytearray, 030010* NUMERO REFERENCE TERMINAL MEDIA 96 03/02/98 30000440 Y7XRRD_LIB_CLI_OPE_1: chararray, 030020 03 Y7XRRD-NO-LOT PIC X(3). 30000450 030030* NUMéRO DE LOT 100 13/10/97 30000460 Y7XRRD_LIB_ITE_OPE: chararray, 030040 03 Y7XRRD-TDS-LIBELLES. 30000470 Y7XRRD_LIB_CT_CLI: chararray, 030050* FAMILLE MONTANTS OPERATION T.DE.SOLDE 103 05/02/98 30000480 030060 05 Y7XRRD-LIB-CLI-OPE-1 PIC X(50). 30000490 Y7XRRD_CD_UTI_LIB_CPL: chararray, 030070* LIBELLE CLIENT OPERATION SX:-1 103 03/02/98 30000500 Y7XRRD_IDC_COM_OPE: chararray, 030080 05 Y7XRRD-LIB-ITE-OPE PIC X(32). 30000510 030090* LIBELLE INTERNE OPERATION 153 - - 30000520 Y7XRRD_CD_TY_OPE_NIV_1: chararray, 030100 05 Y7XRRD-LIB-CT-CLI PIC X(32). 30000530 Y7XRRD_CD_TY_OPE_NIV_2: chararray, 030110* LIBELLE COURT CLIENT 185 - - 30000540 FILLER: chararray, 030120 03 Y7XRRD-CD-UTI-LIB-CPL PIC X(1). 30000550 030130* Code utilisation libellés compl. 217 28/12/94 30000560 Y7XRRD_TDS_LIB_SUPPL: bytearray, 030140 03 Y7XRRD-IDC-COM-OPE PIC X(1). 30000570 Y7XRRD_LIB_CLI_OPE_02: chararray, 040010* INDICATEUR COMMISSION OPERATION 218 03/02/98 30000580 040020 03 Y7XRRD-CD-TY-OPE-NIV-1 PIC X(1). 30000590 Y7XRRD_LIB_CLI_OPE_03: chararray 040030* CODE TYPE OPERATION NIVEAU UN 219 - - 30000600 ) 040040 03 Y7XRRD-CD-TY-OPE-NIV-2 PIC X(2). 30000610 040050* CODE TYPE OPERATION NIVEAU DEUX 220 - - 30000620 } 040060 03 FILLER PIC X(7). 30000630 040070* 222 30000640 040080 03 Y7XRRD-TDS-LIB-SUPPL. 30000650 040090* FAMILLE LIBELLES COMPLEMENTAIRES T.D.S 229 17/02/98 30000660 040100 05 Y7XRRD-LIB-CLI-OPE-02 PIC X(50). 30000670 040110* LIBELLE CLIENT OPERATION SX:-02 229 03/02/98 30000680 040120 05 Y7XRRD-LIB-CLI-OPE-03 PIC X(50). 30000690 040130* LIBELLE CLIENT OPERATION SX:-03 279 - - 30000700
  • 34. com.arkea.commons.pig.DB2UnloadBinaryHelper REGISTER ddl-load.jar; A = LOAD '$data' USING cacp.SequenceFileLoadFunc('cacp.DB2UnloadBinaryHelper','[PREFIX:]TABLE'); CREATE TABLE SHDBA.TBDCOLS (COL_CHAR CHAR(4) FOR SBCS DATA WITH DEFAULT NULL, COL_DECIMAL DECIMAL(15, 2) WITH DEFAULT NULL, COL_NUMERIC DECIMAL(15, 0) WITH DEFAULT NULL, .ddl COL_SMALLINT SMALLINT WITH DEFAULT NULL, COL_INTEGER INTEGER WITH DEFAULT NULL, A: { COL_VARCHAR VARCHAR(50) FOR SBCS DATA WITH DEFAULT NULL, key: bytearray, COL_DATE DATE WITH DEFAULT NULL, value: bytearray, COL_TIME TIME WITH DEFAULT NULL, parsed: ( COL_TIMESTAMP TIMESTAMP WITH DEFAULT NULL) ; COL_CHAR: chararray, COL_DECIMAL: double, COL_NUMERIC: long, COL_SMALLINT: long, TEMPLATE DFEM8ERT COL_INTEGER: long, DSN('XXXXX.PPSDR.B99BD02.SBDCOLS.REC') COL_VARCHAR: chararray, DISP(OLD,KEEP,KEEP) COL_DATE: chararray, LOAD DATA INDDN DFEM8ERT LOG NO RESUME YES COL_TIME: chararray, EBCDIC CCSID(01147,00000,00000) COL_TIMESTAMP: INTO TABLE "SHDBA"."TBDCOLS" chararray WHEN(00001:00002) = X'003F' ) .load ( "COL_CHAR" POSITION( 00004:00007) CHAR(00004) NULLIF(00003)=X'FF', } "COL_DECIMAL" POSITION( 00009:00016) DECIMAL NULLIF(00008)=X'FF', "COL_NUMERIC" POSITION( 00018:00025) DECIMAL NULLIF(00017)=X'FF', "COL_SMALLINT" POSITION( 00027:00028) SMALLINT NULLIF(00026)=X'FF', "COL_INTEGER" POSITION( 00030:00033) INTEGER NULLIF(00029)=X'FF', "COL_VARCHAR" POSITION( 00035:00086) VARCHAR NULLIF(00034)=X'FF', "COL_DATE" POSITION( 00088:00097) DATE EXTERNAL NULLIF(00087)=X'FF', "COL_TIME" POSITION( 00099:00106) TIME EXTERNAL NULLIF(00098)=X'FF', "COL_TIMESTAMP" POSITION( 00108:00133) TIMESTAMP EXTERNAL NULLIF(00107)=X'FF' ) Can also handle DB2 UDB unloads (done using hpu)
  • 35. we're Thrifty too... REGISTER thrift-generated.jar; A = LOAD '$data' USING cacp.SequenceFileLoadFunc('cacp.ThiftBinaryHelper','CLASS'); struct Redirection{ A: { 1: string alias, key: bytearray, 2: string url, value: bytearray, 3: string email, parsed: ( 4: i64 timestamp, alias: chararray, 5: i64 lastupdate, url: chararray, 6: list<string> params, email: chararray, 7: bool external = 1, timestamp: long, 8: i64 owner, lastupdate: long, 9: string user, params: (), } external: long, owner: long, user: chararray ) } ... and also use MySQL ... REGISTER mysql-ddl.jar; A = LOAD '$data' USING cacp.SequenceFileLoadFunc('cacp.MySQLBinaryHelper','TABLE'); ... etc etc etc ...