SlideShare ist ein Scribd-Unternehmen logo
1 von 21
Downloaden Sie, um offline zu lesen
Open Source ETL using Talend Open Studio

                                    Lu´ Santos
                                      ıs
                                luis@luissantos.pt



                                 February 14, 2013




Lu´ Santos luis@luissantos.pt
  ıs                                Open Source ETL   February 14, 2013   1
Overview

1    Who am i?

2    What is ETL?

3    ETL Software Suites

4    Talend Open Studio for Data Integration

5    Hands on

6    Conclusion



    Lu´ Santos luis@luissantos.pt
      ıs                            Open Source ETL   February 14, 2013   2
Warning!!!




This presentation was created using Latex
                  Why?
             Because i can!




  Lu´ Santos luis@luissantos.pt
    ıs                            Open Source ETL   February 14, 2013   3
Who am i?




Lu´ Santos luis@luissantos.pt
  ıs                              Open Source ETL   February 14, 2013   4
Who am i?




          Software Engineer and
          Mathematics Student
          Open Source addicted
          PHP and Java Developer




 Lu´ Santos luis@luissantos.pt
   ıs                             Open Source ETL   February 14, 2013   5
What is ETL?




Lu´ Santos luis@luissantos.pt
  ıs                               Open Source ETL   February 14, 2013   6
What is ETL?


     In computing, Extract, Transform and Load (ETL) refers to a
     process in database usage and especially in data warehousing
     that involves:
             Extracting data from outside sources
             Transforming it to fit operational needs (which can include
             quality levels)
             Loading it into the end target (database, more specifically,
             operational data store, data mart or data warehouse)



        (2013, http://en.wikipedia.org/wiki/Extract, transform, load)




 Lu´ Santos luis@luissantos.pt
   ıs                              Open Source ETL               February 14, 2013   7
ETL Software Suites




      Pentaho Data Integration (Kettle)
      SQL Server Integration Services
      Talend Open Studio for Data Integration
      etc...




  Lu´ Santos luis@luissantos.pt
    ıs                            Open Source ETL   February 14, 2013   8
Talend Open Studio for Data Integration


Talend Open Studio is a set of tools for developing, testing, deploying and
application integration projects.
      Talend Open Studio for Big Data
      Bonita Open Solution (BPM)
      Talend Open Studio for Data Integration
      Talend Open Studio for Data Quality
      Talend ESB
      Talend Open Studio for MDM




  Lu´ Santos luis@luissantos.pt
    ıs                            Open Source ETL             February 14, 2013   9
Datasource(rer)s




Lu´ Santos luis@luissantos.pt
  ıs                                 Open Source ETL   February 14, 2013   10
Datasources (Extract and Load)




  Mysql, MSSQL, Oracle, Sqlite, FirebirdSQL, XLS, CSV, XML, SOAP,
                  REST, HTTP, FTP, SSH, Imap




  Lu´ Santos luis@luissantos.pt
    ıs                            Open Source ETL     February 14, 2013   11
Transformers




Lu´ Santos luis@luissantos.pt
  ıs                               Open Source ETL   February 14, 2013   12
Transformers (Transform)




      Sort data
      Convert data
      Cross data between datasources
      Filter data
      Fuzzy search
      Normalize and Denormalize data




  Lu´ Santos luis@luissantos.pt
    ıs                            Open Source ETL   February 14, 2013   13
Where and how ?



     Where ?
             Multi-platform ( Linux, MacOs, BSD-* even on windows )
             You just need a JVM (Java Virtual Machine)




 Lu´ Santos luis@luissantos.pt
   ıs                              Open Source ETL              February 14, 2013   14
Where and how ?



     Where ?
             Multi-platform ( Linux, MacOs, BSD-* even on windows )
             You just need a JVM (Java Virtual Machine)
     How ?
             Execute it from your favorite programming language using syscalls
             Command line
             From your JVM based application (Java, Groovy, JRuby)
             Webservices runing on the top Java App Server (Tomcat, Glassfish)




 Lu´ Santos luis@luissantos.pt
   ıs                               Open Source ETL               February 14, 2013   14
Hands on




Lu´ Santos luis@luissantos.pt
  ıs                             Open Source ETL   February 14, 2013   15
Hands on




     Querying data
     Joining data from multiple datasources
     Filtering and sorting data
     Exporting data
     Deploying your job
     Calling it from PHP




 Lu´ Santos luis@luissantos.pt
   ıs                             Open Source ETL   February 14, 2013   16
Database Schema




 Lu´ Santos luis@luissantos.pt
   ıs                            Open Source ETL   February 14, 2013   17
Example




 Lu´ Santos luis@luissantos.pt
   ıs                            Open Source ETL   February 14, 2013   18
”With great power comes great responsability.”
                                         (Voltair)




Lu´ Santos luis@luissantos.pt
  ıs                            Open Source ETL      February 14, 2013   19
The End
    email: luis@luissantos.pt
    twitter: @santosluis87
    linkedin: https://www.linkedin.com/in/luissantos87




Lu´ Santos luis@luissantos.pt
  ıs                             Open Source ETL         February 14, 2013   20

Weitere ähnliche Inhalte

Was ist angesagt?

Manipulating Data with Talend.
Manipulating Data with Talend.Manipulating Data with Talend.
Manipulating Data with Talend.Edureka!
 
Talend ETL Tutorial | Talend Tutorial For Beginners | Talend Online Training ...
Talend ETL Tutorial | Talend Tutorial For Beginners | Talend Online Training ...Talend ETL Tutorial | Talend Tutorial For Beginners | Talend Online Training ...
Talend ETL Tutorial | Talend Tutorial For Beginners | Talend Online Training ...Edureka!
 
Talend Big Data Capabilities - 2014
Talend Big Data Capabilities - 2014Talend Big Data Capabilities - 2014
Talend Big Data Capabilities - 2014Rajan Kanitkar
 
Talend Open Studio for Big Data | Talend Open Studio Tutorial | Talend Online...
Talend Open Studio for Big Data | Talend Open Studio Tutorial | Talend Online...Talend Open Studio for Big Data | Talend Open Studio Tutorial | Talend Online...
Talend Open Studio for Big Data | Talend Open Studio Tutorial | Talend Online...Edureka!
 
Talend Big Data Tutorial | Talend DI and Big Data Certification | Talend Onli...
Talend Big Data Tutorial | Talend DI and Big Data Certification | Talend Onli...Talend Big Data Tutorial | Talend DI and Big Data Certification | Talend Onli...
Talend Big Data Tutorial | Talend DI and Big Data Certification | Talend Onli...Edureka!
 
Talend online training and jobsupport
Talend online training and jobsupportTalend online training and jobsupport
Talend online training and jobsupportkraja2035
 
Patterns and Anti-Patterns for Memorializing Data Science Project Artifacts
Patterns and Anti-Patterns for Memorializing Data Science Project ArtifactsPatterns and Anti-Patterns for Memorializing Data Science Project Artifacts
Patterns and Anti-Patterns for Memorializing Data Science Project ArtifactsDatabricks
 
Moving and Transforming Data with Pentaho Data Integration 5.0 CE (aka Kettle)
Moving and Transforming Data with Pentaho Data Integration 5.0 CE (aka Kettle)Moving and Transforming Data with Pentaho Data Integration 5.0 CE (aka Kettle)
Moving and Transforming Data with Pentaho Data Integration 5.0 CE (aka Kettle)Roland Bouman
 
KnowIT, semantic informatics knowledge base
KnowIT, semantic informatics knowledge baseKnowIT, semantic informatics knowledge base
KnowIT, semantic informatics knowledge baseLaurent Alquier
 
Machine learning at scale challenges and solutions
Machine learning at scale challenges and solutionsMachine learning at scale challenges and solutions
Machine learning at scale challenges and solutionsStavros Kontopoulos
 
Spark with Azure HDInsight - Tampa Bay Data Science - Adnan Masood, PhD
Spark with Azure HDInsight  - Tampa Bay Data Science - Adnan Masood, PhDSpark with Azure HDInsight  - Tampa Bay Data Science - Adnan Masood, PhD
Spark with Azure HDInsight - Tampa Bay Data Science - Adnan Masood, PhDAdnan Masood
 
Not Your Father’s Data Warehouse: Breaking Tradition with Innovation
Not Your Father’s Data Warehouse: Breaking Tradition with InnovationNot Your Father’s Data Warehouse: Breaking Tradition with Innovation
Not Your Father’s Data Warehouse: Breaking Tradition with InnovationInside Analysis
 
From Raw Data to Analytics with No ETL
From Raw Data to Analytics with No ETLFrom Raw Data to Analytics with No ETL
From Raw Data to Analytics with No ETLCloudera, Inc.
 
The Rise of Big Data Governance: Insight on this Emerging Trend from Active O...
The Rise of Big Data Governance: Insight on this Emerging Trend from Active O...The Rise of Big Data Governance: Insight on this Emerging Trend from Active O...
The Rise of Big Data Governance: Insight on this Emerging Trend from Active O...DataWorks Summit
 

Was ist angesagt? (20)

Manipulating Data with Talend.
Manipulating Data with Talend.Manipulating Data with Talend.
Manipulating Data with Talend.
 
Talend for big_data_intorduction
Talend for big_data_intorductionTalend for big_data_intorduction
Talend for big_data_intorduction
 
Talend ETL Tutorial | Talend Tutorial For Beginners | Talend Online Training ...
Talend ETL Tutorial | Talend Tutorial For Beginners | Talend Online Training ...Talend ETL Tutorial | Talend Tutorial For Beginners | Talend Online Training ...
Talend ETL Tutorial | Talend Tutorial For Beginners | Talend Online Training ...
 
TaLend Online Training
TaLend Online TrainingTaLend Online Training
TaLend Online Training
 
Talend Big Data Capabilities - 2014
Talend Big Data Capabilities - 2014Talend Big Data Capabilities - 2014
Talend Big Data Capabilities - 2014
 
Talend Open Studio for Big Data | Talend Open Studio Tutorial | Talend Online...
Talend Open Studio for Big Data | Talend Open Studio Tutorial | Talend Online...Talend Open Studio for Big Data | Talend Open Studio Tutorial | Talend Online...
Talend Open Studio for Big Data | Talend Open Studio Tutorial | Talend Online...
 
Talend Big Data Tutorial | Talend DI and Big Data Certification | Talend Onli...
Talend Big Data Tutorial | Talend DI and Big Data Certification | Talend Onli...Talend Big Data Tutorial | Talend DI and Big Data Certification | Talend Onli...
Talend Big Data Tutorial | Talend DI and Big Data Certification | Talend Onli...
 
Talend online training and jobsupport
Talend online training and jobsupportTalend online training and jobsupport
Talend online training and jobsupport
 
Patterns and Anti-Patterns for Memorializing Data Science Project Artifacts
Patterns and Anti-Patterns for Memorializing Data Science Project ArtifactsPatterns and Anti-Patterns for Memorializing Data Science Project Artifacts
Patterns and Anti-Patterns for Memorializing Data Science Project Artifacts
 
Oracle data integrator (odi) online training
Oracle data integrator (odi) online trainingOracle data integrator (odi) online training
Oracle data integrator (odi) online training
 
Introduction To Pentaho Kettle
Introduction To Pentaho KettleIntroduction To Pentaho Kettle
Introduction To Pentaho Kettle
 
Moving and Transforming Data with Pentaho Data Integration 5.0 CE (aka Kettle)
Moving and Transforming Data with Pentaho Data Integration 5.0 CE (aka Kettle)Moving and Transforming Data with Pentaho Data Integration 5.0 CE (aka Kettle)
Moving and Transforming Data with Pentaho Data Integration 5.0 CE (aka Kettle)
 
KnowIT, semantic informatics knowledge base
KnowIT, semantic informatics knowledge baseKnowIT, semantic informatics knowledge base
KnowIT, semantic informatics knowledge base
 
Kettle – Etl Tool
Kettle – Etl ToolKettle – Etl Tool
Kettle – Etl Tool
 
Machine learning at scale challenges and solutions
Machine learning at scale challenges and solutionsMachine learning at scale challenges and solutions
Machine learning at scale challenges and solutions
 
Spark with Azure HDInsight - Tampa Bay Data Science - Adnan Masood, PhD
Spark with Azure HDInsight  - Tampa Bay Data Science - Adnan Masood, PhDSpark with Azure HDInsight  - Tampa Bay Data Science - Adnan Masood, PhD
Spark with Azure HDInsight - Tampa Bay Data Science - Adnan Masood, PhD
 
Not Your Father’s Data Warehouse: Breaking Tradition with Innovation
Not Your Father’s Data Warehouse: Breaking Tradition with InnovationNot Your Father’s Data Warehouse: Breaking Tradition with Innovation
Not Your Father’s Data Warehouse: Breaking Tradition with Innovation
 
From Raw Data to Analytics with No ETL
From Raw Data to Analytics with No ETLFrom Raw Data to Analytics with No ETL
From Raw Data to Analytics with No ETL
 
Great Scott! Dealing with New Datatypes
Great Scott! Dealing with New DatatypesGreat Scott! Dealing with New Datatypes
Great Scott! Dealing with New Datatypes
 
The Rise of Big Data Governance: Insight on this Emerging Trend from Active O...
The Rise of Big Data Governance: Insight on this Emerging Trend from Active O...The Rise of Big Data Governance: Insight on this Emerging Trend from Active O...
The Rise of Big Data Governance: Insight on this Emerging Trend from Active O...
 

Ähnlich wie Open Source ETL using Talend Open Studio

Data analytics course 3
Data analytics course 3Data analytics course 3
Data analytics course 3nakshatraL
 
Elephants vs. Dolphins: Comparing PostgreSQL and MySQL for use in the DoD
Elephants vs. Dolphins:  Comparing PostgreSQL and MySQL for use in the DoDElephants vs. Dolphins:  Comparing PostgreSQL and MySQL for use in the DoD
Elephants vs. Dolphins: Comparing PostgreSQL and MySQL for use in the DoDJamey Hanson
 
Navigating the Open Source Geospatial Ecosystem
Navigating the Open Source Geospatial EcosystemNavigating the Open Source Geospatial Ecosystem
Navigating the Open Source Geospatial EcosystemJust van den Broecke
 
Tyler Rutschman- Kansas City
Tyler Rutschman- Kansas CityTyler Rutschman- Kansas City
Tyler Rutschman- Kansas CitySplunk
 
Application Engine ETL
Application Engine ETLApplication Engine ETL
Application Engine ETLkabrilake
 
FIWARE Wednesday Webinars - Performing Big Data Analysis Using Cosmos With Sp...
FIWARE Wednesday Webinars - Performing Big Data Analysis Using Cosmos With Sp...FIWARE Wednesday Webinars - Performing Big Data Analysis Using Cosmos With Sp...
FIWARE Wednesday Webinars - Performing Big Data Analysis Using Cosmos With Sp...FIWARE
 
Migration to Lotus Groupware @ UZH
Migration to Lotus Groupware  @ UZHMigration to Lotus Groupware  @ UZH
Migration to Lotus Groupware @ UZHRoberto Mazzoni
 
Linked (Open) Data: A quick introduction
Linked (Open) Data: A quick introductionLinked (Open) Data: A quick introduction
Linked (Open) Data: A quick introductionnvitucci
 
Linked Open Data - State of the Art, Challenges and Applications
Linked Open Data - State of the Art, Challenges and ApplicationsLinked Open Data - State of the Art, Challenges and Applications
Linked Open Data - State of the Art, Challenges and ApplicationsRui Vieira
 
Gerenral insurance Accounts IT and Investment
Gerenral insurance Accounts IT and InvestmentGerenral insurance Accounts IT and Investment
Gerenral insurance Accounts IT and Investmentvijayk23x
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Data Enthusiasts London: Scalable and Interoperable data services. Applied to...
Data Enthusiasts London: Scalable and Interoperable data services. Applied to...Data Enthusiasts London: Scalable and Interoperable data services. Applied to...
Data Enthusiasts London: Scalable and Interoperable data services. Applied to...Andy Petrella
 
Andy Petrella_Med@Scale by Data Fellas: Scalable and Interoperable Genomics d...
Andy Petrella_Med@Scale by Data Fellas: Scalable and Interoperable Genomics d...Andy Petrella_Med@Scale by Data Fellas: Scalable and Interoperable Genomics d...
Andy Petrella_Med@Scale by Data Fellas: Scalable and Interoperable Genomics d...Dataconomy Media
 
Linked Open Data: A simple how-to
Linked Open Data: A simple how-toLinked Open Data: A simple how-to
Linked Open Data: A simple how-tonvitucci
 
GET STARTED WITH R FOR DATA SCIENCE
GET STARTED WITH R FOR DATA SCIENCEGET STARTED WITH R FOR DATA SCIENCE
GET STARTED WITH R FOR DATA SCIENCEUSDSI
 

Ähnlich wie Open Source ETL using Talend Open Studio (20)

20130206 open refine
20130206  open refine20130206  open refine
20130206 open refine
 
Treasure Data Cloud Strategy
Treasure Data Cloud StrategyTreasure Data Cloud Strategy
Treasure Data Cloud Strategy
 
Data analytics course 3
Data analytics course 3Data analytics course 3
Data analytics course 3
 
20100714accel
20100714accel20100714accel
20100714accel
 
Elephants vs. Dolphins: Comparing PostgreSQL and MySQL for use in the DoD
Elephants vs. Dolphins:  Comparing PostgreSQL and MySQL for use in the DoDElephants vs. Dolphins:  Comparing PostgreSQL and MySQL for use in the DoD
Elephants vs. Dolphins: Comparing PostgreSQL and MySQL for use in the DoD
 
Navigating the Open Source Geospatial Ecosystem
Navigating the Open Source Geospatial EcosystemNavigating the Open Source Geospatial Ecosystem
Navigating the Open Source Geospatial Ecosystem
 
Tyler Rutschman- Kansas City
Tyler Rutschman- Kansas CityTyler Rutschman- Kansas City
Tyler Rutschman- Kansas City
 
Application Engine ETL
Application Engine ETLApplication Engine ETL
Application Engine ETL
 
Oracle GoldenGate for Oracle DBAs
Oracle GoldenGate for Oracle DBAsOracle GoldenGate for Oracle DBAs
Oracle GoldenGate for Oracle DBAs
 
Lakshmi_DB_Engineer1
Lakshmi_DB_Engineer1Lakshmi_DB_Engineer1
Lakshmi_DB_Engineer1
 
FIWARE Wednesday Webinars - Performing Big Data Analysis Using Cosmos With Sp...
FIWARE Wednesday Webinars - Performing Big Data Analysis Using Cosmos With Sp...FIWARE Wednesday Webinars - Performing Big Data Analysis Using Cosmos With Sp...
FIWARE Wednesday Webinars - Performing Big Data Analysis Using Cosmos With Sp...
 
Migration to Lotus Groupware @ UZH
Migration to Lotus Groupware  @ UZHMigration to Lotus Groupware  @ UZH
Migration to Lotus Groupware @ UZH
 
Linked (Open) Data: A quick introduction
Linked (Open) Data: A quick introductionLinked (Open) Data: A quick introduction
Linked (Open) Data: A quick introduction
 
Linked Open Data - State of the Art, Challenges and Applications
Linked Open Data - State of the Art, Challenges and ApplicationsLinked Open Data - State of the Art, Challenges and Applications
Linked Open Data - State of the Art, Challenges and Applications
 
Gerenral insurance Accounts IT and Investment
Gerenral insurance Accounts IT and InvestmentGerenral insurance Accounts IT and Investment
Gerenral insurance Accounts IT and Investment
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Data Enthusiasts London: Scalable and Interoperable data services. Applied to...
Data Enthusiasts London: Scalable and Interoperable data services. Applied to...Data Enthusiasts London: Scalable and Interoperable data services. Applied to...
Data Enthusiasts London: Scalable and Interoperable data services. Applied to...
 
Andy Petrella_Med@Scale by Data Fellas: Scalable and Interoperable Genomics d...
Andy Petrella_Med@Scale by Data Fellas: Scalable and Interoperable Genomics d...Andy Petrella_Med@Scale by Data Fellas: Scalable and Interoperable Genomics d...
Andy Petrella_Med@Scale by Data Fellas: Scalable and Interoperable Genomics d...
 
Linked Open Data: A simple how-to
Linked Open Data: A simple how-toLinked Open Data: A simple how-to
Linked Open Data: A simple how-to
 
GET STARTED WITH R FOR DATA SCIENCE
GET STARTED WITH R FOR DATA SCIENCEGET STARTED WITH R FOR DATA SCIENCE
GET STARTED WITH R FOR DATA SCIENCE
 

Kürzlich hochgeladen

Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 

Kürzlich hochgeladen (20)

Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 

Open Source ETL using Talend Open Studio

  • 1. Open Source ETL using Talend Open Studio Lu´ Santos ıs luis@luissantos.pt February 14, 2013 Lu´ Santos luis@luissantos.pt ıs Open Source ETL February 14, 2013 1
  • 2. Overview 1 Who am i? 2 What is ETL? 3 ETL Software Suites 4 Talend Open Studio for Data Integration 5 Hands on 6 Conclusion Lu´ Santos luis@luissantos.pt ıs Open Source ETL February 14, 2013 2
  • 3. Warning!!! This presentation was created using Latex Why? Because i can! Lu´ Santos luis@luissantos.pt ıs Open Source ETL February 14, 2013 3
  • 4. Who am i? Lu´ Santos luis@luissantos.pt ıs Open Source ETL February 14, 2013 4
  • 5. Who am i? Software Engineer and Mathematics Student Open Source addicted PHP and Java Developer Lu´ Santos luis@luissantos.pt ıs Open Source ETL February 14, 2013 5
  • 6. What is ETL? Lu´ Santos luis@luissantos.pt ıs Open Source ETL February 14, 2013 6
  • 7. What is ETL? In computing, Extract, Transform and Load (ETL) refers to a process in database usage and especially in data warehousing that involves: Extracting data from outside sources Transforming it to fit operational needs (which can include quality levels) Loading it into the end target (database, more specifically, operational data store, data mart or data warehouse) (2013, http://en.wikipedia.org/wiki/Extract, transform, load) Lu´ Santos luis@luissantos.pt ıs Open Source ETL February 14, 2013 7
  • 8. ETL Software Suites Pentaho Data Integration (Kettle) SQL Server Integration Services Talend Open Studio for Data Integration etc... Lu´ Santos luis@luissantos.pt ıs Open Source ETL February 14, 2013 8
  • 9. Talend Open Studio for Data Integration Talend Open Studio is a set of tools for developing, testing, deploying and application integration projects. Talend Open Studio for Big Data Bonita Open Solution (BPM) Talend Open Studio for Data Integration Talend Open Studio for Data Quality Talend ESB Talend Open Studio for MDM Lu´ Santos luis@luissantos.pt ıs Open Source ETL February 14, 2013 9
  • 10. Datasource(rer)s Lu´ Santos luis@luissantos.pt ıs Open Source ETL February 14, 2013 10
  • 11. Datasources (Extract and Load) Mysql, MSSQL, Oracle, Sqlite, FirebirdSQL, XLS, CSV, XML, SOAP, REST, HTTP, FTP, SSH, Imap Lu´ Santos luis@luissantos.pt ıs Open Source ETL February 14, 2013 11
  • 12. Transformers Lu´ Santos luis@luissantos.pt ıs Open Source ETL February 14, 2013 12
  • 13. Transformers (Transform) Sort data Convert data Cross data between datasources Filter data Fuzzy search Normalize and Denormalize data Lu´ Santos luis@luissantos.pt ıs Open Source ETL February 14, 2013 13
  • 14. Where and how ? Where ? Multi-platform ( Linux, MacOs, BSD-* even on windows ) You just need a JVM (Java Virtual Machine) Lu´ Santos luis@luissantos.pt ıs Open Source ETL February 14, 2013 14
  • 15. Where and how ? Where ? Multi-platform ( Linux, MacOs, BSD-* even on windows ) You just need a JVM (Java Virtual Machine) How ? Execute it from your favorite programming language using syscalls Command line From your JVM based application (Java, Groovy, JRuby) Webservices runing on the top Java App Server (Tomcat, Glassfish) Lu´ Santos luis@luissantos.pt ıs Open Source ETL February 14, 2013 14
  • 16. Hands on Lu´ Santos luis@luissantos.pt ıs Open Source ETL February 14, 2013 15
  • 17. Hands on Querying data Joining data from multiple datasources Filtering and sorting data Exporting data Deploying your job Calling it from PHP Lu´ Santos luis@luissantos.pt ıs Open Source ETL February 14, 2013 16
  • 18. Database Schema Lu´ Santos luis@luissantos.pt ıs Open Source ETL February 14, 2013 17
  • 19. Example Lu´ Santos luis@luissantos.pt ıs Open Source ETL February 14, 2013 18
  • 20. ”With great power comes great responsability.” (Voltair) Lu´ Santos luis@luissantos.pt ıs Open Source ETL February 14, 2013 19
  • 21. The End email: luis@luissantos.pt twitter: @santosluis87 linkedin: https://www.linkedin.com/in/luissantos87 Lu´ Santos luis@luissantos.pt ıs Open Source ETL February 14, 2013 20