SlideShare ist ein Scribd-Unternehmen logo
1 von 25
Intro to
Talend Open Studio
for
Data Integration
Philip Yurchuk
http://philip.yurchuk.com
What is Talend?
 Eclipse-based visual

programming editor
 Generates executable Java code
 Jobs can run standalone or
embedded (no special server)
 Batch or interactive (user input)
What is ETL?
 Extract: suck up data

 Transform: mess with it

Load: blow it out
Batch, integration, mi

gration, etc.
Extract from/load to where?
 Over 600 components

 Over 450 connectors
 Allows multiple

inputs/outputs in single job
Connectors
 Flat files

 Applications/Platforms

 Delimted (tab, CSV…)

 Alfresco

 XML

 Microsoft Dynamics

 JSON
 Excel
 Positional
 Apache HTTP

logs, HL7...

(CRM, AX)
 SAP
 Sage ERP X3
 Salesforce
 SugarCRM
Connectors (continued)
 Relational Databases
 MySQL
 Postgresql
 MS SQL
 Oracle
 Many more

 NoSQL/Columnar/OLAP/

Other
 Amazon RedShift
 Greenplum
 Hive
 OLAP cubes
 LDAP
 VectorWise
 Teradata
 More in Big Data ed.
How do we transport data?
 File system
 FTP
 SFTP/SCP
 Web service (SOAP,

REST)

 HTTP
 Mail, POP
 XMLRPC, Sockets, JMS, RSS...
Other Components
 Process data: join, filter, aggregate
 Flow control: loops, job invocation
 Logs, statistics
 Code: Java, Groovy
 On row data or standalone
 Can load libraries
Demo
Nifty Components
 FuzzyMatch - calculate Levenshtein distance or

phonetic similarity
 IntervalMatch – perform lookup/join based on
values falling within an interval
 Replace, ReplaceList - search and
replace, substitution
 UniqRow - output distinct rows based on defined
key columns
More Nifty Components
 XMLMap - Allows joins, column or row

filtering, transformations, and multiple outputs
 Normalize/Denormalize - split delimited strings
into columns or join columns into a string
 AggregateRow – GROUP BY;
min, max, sum, other functions used to aggregate
rows on a column
Tips and Tricks
 CamelCase job names for embedded jobs.
 Or prefix with ETL phase and order of execution
 Whenever appropriate (esp. for inserting

data), use the schema from the repository.
 When connecting, propagating changes to a DB
component will change it to a built-in
schema, which won't get updated.
Tips and Tricks
 Propagating changes to a DB component will

change it to a built-in schema, which won't get
updated after repo changes.
 On the other hand, remember that for
lookup/join (i.e., SELECT) queries you can
modify the query to only select the fields you
need. Propagating the schema is useful then.
Tips and Tricks
 Failure handling subjob:
 It’s an unconnected job (no triggers point to it)
 Use LogCatcher to catch, record component failures.
 Record failure in DB, file, email, etc.
 Add rollback component to undo DB changes if
necessary. May need to do this in the job if strategic
placement is needed.
Tips and Tricks
 In Java expressions, use methods, not

operators. E.g., concat(String) instead of the dot
operator, equals(Object) instead of ==.
 Technical components (like hash maps) are
hidden by default. See:
http://www.talendforge.org/forum/viewtopic.p
hp?pid=110860
Tips and Tricks
 When connecting, propagating changes to a DB

component will change it to a built-in
schema, which won't get updated after repo
changes.
 On the other hand, remember that for
lookup/join (i.e., SELECT) queries you can
modify the query to only select the fields you
need. Propagating the schema is useful then.
Tips and Tricks
 Use a context for job variables.
 Note you can specify type for variables.
 You can read from a file or database, or
pass in a context if an embedded Java
job.
Tips and Tricks
 For multi-host deployment:
 Export the job with a “bootstrap” context that has all
variables, but populates only a context config location that is
the same for all machines.
 The context config file has all values required for that host, e.g.
test DB connection for test machine.
 You can rely on the fact that Windows will interpret root as the
main system drive, so “/Data/” will translate to C:Data
 Be mindful of file permissions for sensitive context data
(e.g., DB password)
Tips and Tricks
 Use “Bulk” output components when possible.
 For transactional behavior:
 Start the job with DB connection
 Check “use existing connection” in all relevant
components
 Check "Die on error" in all relevant components
 End job with commit component
Room for Improvement
 UI stability

 Documentation
Books
 Getting Started with Talend Open Studio

for Data Integration by Bowen Jonathan
 Talend Open Studio Cookbook by Rick
Daniel Barton
 Big Data book coming…
Talend Forge
 http://www.talendforge.org/
 Forum – super helpful
 Exchange – free community components!
 Tutorials
 Bug tracker
 Source code
Talend Resources
 http://www.talend.com/resources
 Help Center
 Knowledge Base

 Webinars, screencasts
 Tutorials

 Docs are on download page
 And by pressing F1 on a component
Questions?
Compliments?
Consulting gigs?
 Contact me:
 philip@yurchuk.com
 http://philip.yurchuk.com
 http://www.linkedin.com/in/philipyurchuk/
Thank You!

Weitere ähnliche Inhalte

Was ist angesagt?

Talend Big Data Capabilities Overview
Talend Big Data Capabilities OverviewTalend Big Data Capabilities Overview
Talend Big Data Capabilities OverviewRajan Kanitkar
 
Etl overview training
Etl overview trainingEtl overview training
Etl overview trainingMondy Holten
 
Talend Introduction by TSI
Talend Introduction by TSITalend Introduction by TSI
Talend Introduction by TSIRemain Software
 
DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDatabricks
 
Speeding Time to Insight with a Modern ELT Approach
Speeding Time to Insight with a Modern ELT ApproachSpeeding Time to Insight with a Modern ELT Approach
Speeding Time to Insight with a Modern ELT ApproachDatabricks
 
Making Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse TechnologyMaking Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse TechnologyMatei Zaharia
 
Introduction to ETL process
Introduction to ETL process Introduction to ETL process
Introduction to ETL process Omid Vahdaty
 
What is Talend | Talend Tutorial for Beginners | Talend Online Training | Edu...
What is Talend | Talend Tutorial for Beginners | Talend Online Training | Edu...What is Talend | Talend Tutorial for Beginners | Talend Online Training | Edu...
What is Talend | Talend Tutorial for Beginners | Talend Online Training | Edu...Edureka!
 
Talend Open Studio for Big Data | Talend Open Studio Tutorial | Talend Online...
Talend Open Studio for Big Data | Talend Open Studio Tutorial | Talend Online...Talend Open Studio for Big Data | Talend Open Studio Tutorial | Talend Online...
Talend Open Studio for Big Data | Talend Open Studio Tutorial | Talend Online...Edureka!
 
Introduction to Data Virtualization (session 1 from Packed Lunch Webinar Series)
Introduction to Data Virtualization (session 1 from Packed Lunch Webinar Series)Introduction to Data Virtualization (session 1 from Packed Lunch Webinar Series)
Introduction to Data Virtualization (session 1 from Packed Lunch Webinar Series)Denodo
 
ETL VS ELT.pdf
ETL VS ELT.pdfETL VS ELT.pdf
ETL VS ELT.pdfBOSupport
 
Why shift from ETL to ELT?
Why shift from ETL to ELT?Why shift from ETL to ELT?
Why shift from ETL to ELT?HEXANIKA
 

Was ist angesagt? (20)

Talend Big Data Capabilities Overview
Talend Big Data Capabilities OverviewTalend Big Data Capabilities Overview
Talend Big Data Capabilities Overview
 
Etl overview training
Etl overview trainingEtl overview training
Etl overview training
 
Talend Introduction by TSI
Talend Introduction by TSITalend Introduction by TSI
Talend Introduction by TSI
 
Etl techniques
Etl techniquesEtl techniques
Etl techniques
 
What is ETL?
What is ETL?What is ETL?
What is ETL?
 
DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Speeding Time to Insight with a Modern ELT Approach
Speeding Time to Insight with a Modern ELT ApproachSpeeding Time to Insight with a Modern ELT Approach
Speeding Time to Insight with a Modern ELT Approach
 
RDF and OWL
RDF and OWLRDF and OWL
RDF and OWL
 
Making Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse TechnologyMaking Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse Technology
 
Apache Nifi Crash Course
Apache Nifi Crash CourseApache Nifi Crash Course
Apache Nifi Crash Course
 
Introduction to ETL process
Introduction to ETL process Introduction to ETL process
Introduction to ETL process
 
What is Talend | Talend Tutorial for Beginners | Talend Online Training | Edu...
What is Talend | Talend Tutorial for Beginners | Talend Online Training | Edu...What is Talend | Talend Tutorial for Beginners | Talend Online Training | Edu...
What is Talend | Talend Tutorial for Beginners | Talend Online Training | Edu...
 
Talend Open Studio for Big Data | Talend Open Studio Tutorial | Talend Online...
Talend Open Studio for Big Data | Talend Open Studio Tutorial | Talend Online...Talend Open Studio for Big Data | Talend Open Studio Tutorial | Talend Online...
Talend Open Studio for Big Data | Talend Open Studio Tutorial | Talend Online...
 
Introduction to Data Virtualization (session 1 from Packed Lunch Webinar Series)
Introduction to Data Virtualization (session 1 from Packed Lunch Webinar Series)Introduction to Data Virtualization (session 1 from Packed Lunch Webinar Series)
Introduction to Data Virtualization (session 1 from Packed Lunch Webinar Series)
 
ETL VS ELT.pdf
ETL VS ELT.pdfETL VS ELT.pdf
ETL VS ELT.pdf
 
ETL Technologies.pptx
ETL Technologies.pptxETL Technologies.pptx
ETL Technologies.pptx
 
Introduction to Data Engineering
Introduction to Data EngineeringIntroduction to Data Engineering
Introduction to Data Engineering
 
Data Vault Overview
Data Vault OverviewData Vault Overview
Data Vault Overview
 
Why shift from ETL to ELT?
Why shift from ETL to ELT?Why shift from ETL to ELT?
Why shift from ETL to ELT?
 
Oracle Data Integrator
Oracle Data Integrator Oracle Data Integrator
Oracle Data Integrator
 

Ähnlich wie Intro to Talend Open Studio for Data Integration

SQL Server 2008 Integration Services
SQL Server 2008 Integration ServicesSQL Server 2008 Integration Services
SQL Server 2008 Integration ServicesEduardo Castro
 
Frustration-Reduced PySpark: Data engineering with DataFrames
Frustration-Reduced PySpark: Data engineering with DataFramesFrustration-Reduced PySpark: Data engineering with DataFrames
Frustration-Reduced PySpark: Data engineering with DataFramesIlya Ganelin
 
The 90-Day Startup with Google AppEngine for Java
The 90-Day Startup with Google AppEngine for JavaThe 90-Day Startup with Google AppEngine for Java
The 90-Day Startup with Google AppEngine for JavaDavid Chandler
 
Practical catalyst
Practical catalystPractical catalyst
Practical catalystdwm042
 
Effective Test Driven Database Development
Effective Test Driven Database DevelopmentEffective Test Driven Database Development
Effective Test Driven Database Developmentelliando dias
 
Handling Database Deployments
Handling Database DeploymentsHandling Database Deployments
Handling Database DeploymentsMike Willbanks
 
Ldap Synchronization Connector @ 2011.RMLL
Ldap Synchronization Connector @ 2011.RMLLLdap Synchronization Connector @ 2011.RMLL
Ldap Synchronization Connector @ 2011.RMLLsbahloul
 
Obevo Javasig.pptx
Obevo Javasig.pptxObevo Javasig.pptx
Obevo Javasig.pptxLadduAnanu
 
ilide.info-talend-open-studio-for-data-integration-pr_f4a743b84c8b04cbebbf4c7...
ilide.info-talend-open-studio-for-data-integration-pr_f4a743b84c8b04cbebbf4c7...ilide.info-talend-open-studio-for-data-integration-pr_f4a743b84c8b04cbebbf4c7...
ilide.info-talend-open-studio-for-data-integration-pr_f4a743b84c8b04cbebbf4c7...khadijahd2
 
Linq 1224887336792847 9
Linq 1224887336792847 9Linq 1224887336792847 9
Linq 1224887336792847 9google
 
Xml Java
Xml JavaXml Java
Xml Javacbee48
 
Frustration-Reduced Spark: DataFrames and the Spark Time-Series Library
Frustration-Reduced Spark: DataFrames and the Spark Time-Series LibraryFrustration-Reduced Spark: DataFrames and the Spark Time-Series Library
Frustration-Reduced Spark: DataFrames and the Spark Time-Series LibraryIlya Ganelin
 
Oracle DBA interview_questions
Oracle DBA interview_questionsOracle DBA interview_questions
Oracle DBA interview_questionsNaveen P
 
Linq To The Enterprise
Linq To The EnterpriseLinq To The Enterprise
Linq To The EnterpriseDaniel Egan
 
Workflow Management with Espresso Workflow
Workflow Management with Espresso WorkflowWorkflow Management with Espresso Workflow
Workflow Management with Espresso WorkflowRolf Kremer
 

Ähnlich wie Intro to Talend Open Studio for Data Integration (20)

SQL Server 2008 Integration Services
SQL Server 2008 Integration ServicesSQL Server 2008 Integration Services
SQL Server 2008 Integration Services
 
Frustration-Reduced PySpark: Data engineering with DataFrames
Frustration-Reduced PySpark: Data engineering with DataFramesFrustration-Reduced PySpark: Data engineering with DataFrames
Frustration-Reduced PySpark: Data engineering with DataFrames
 
The 90-Day Startup with Google AppEngine for Java
The 90-Day Startup with Google AppEngine for JavaThe 90-Day Startup with Google AppEngine for Java
The 90-Day Startup with Google AppEngine for Java
 
Practical catalyst
Practical catalystPractical catalyst
Practical catalyst
 
Migration from 8.1 to 11.3
Migration from 8.1 to 11.3Migration from 8.1 to 11.3
Migration from 8.1 to 11.3
 
Effective Test Driven Database Development
Effective Test Driven Database DevelopmentEffective Test Driven Database Development
Effective Test Driven Database Development
 
Sqllite
SqlliteSqllite
Sqllite
 
Handling Database Deployments
Handling Database DeploymentsHandling Database Deployments
Handling Database Deployments
 
Ldap Synchronization Connector @ 2011.RMLL
Ldap Synchronization Connector @ 2011.RMLLLdap Synchronization Connector @ 2011.RMLL
Ldap Synchronization Connector @ 2011.RMLL
 
Percona Lucid Db
Percona Lucid DbPercona Lucid Db
Percona Lucid Db
 
TaLend Online Training
TaLend Online TrainingTaLend Online Training
TaLend Online Training
 
Obevo Javasig.pptx
Obevo Javasig.pptxObevo Javasig.pptx
Obevo Javasig.pptx
 
ilide.info-talend-open-studio-for-data-integration-pr_f4a743b84c8b04cbebbf4c7...
ilide.info-talend-open-studio-for-data-integration-pr_f4a743b84c8b04cbebbf4c7...ilide.info-talend-open-studio-for-data-integration-pr_f4a743b84c8b04cbebbf4c7...
ilide.info-talend-open-studio-for-data-integration-pr_f4a743b84c8b04cbebbf4c7...
 
Linq 1224887336792847 9
Linq 1224887336792847 9Linq 1224887336792847 9
Linq 1224887336792847 9
 
Xml Java
Xml JavaXml Java
Xml Java
 
Frustration-Reduced Spark: DataFrames and the Spark Time-Series Library
Frustration-Reduced Spark: DataFrames and the Spark Time-Series LibraryFrustration-Reduced Spark: DataFrames and the Spark Time-Series Library
Frustration-Reduced Spark: DataFrames and the Spark Time-Series Library
 
Oracle DBA interview_questions
Oracle DBA interview_questionsOracle DBA interview_questions
Oracle DBA interview_questions
 
Intro to Application Express
Intro to Application ExpressIntro to Application Express
Intro to Application Express
 
Linq To The Enterprise
Linq To The EnterpriseLinq To The Enterprise
Linq To The Enterprise
 
Workflow Management with Espresso Workflow
Workflow Management with Espresso WorkflowWorkflow Management with Espresso Workflow
Workflow Management with Espresso Workflow
 

Kürzlich hochgeladen

Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 

Kürzlich hochgeladen (20)

Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 

Intro to Talend Open Studio for Data Integration

  • 1. Intro to Talend Open Studio for Data Integration Philip Yurchuk http://philip.yurchuk.com
  • 2. What is Talend?  Eclipse-based visual programming editor  Generates executable Java code  Jobs can run standalone or embedded (no special server)  Batch or interactive (user input)
  • 3. What is ETL?  Extract: suck up data  Transform: mess with it Load: blow it out Batch, integration, mi gration, etc.
  • 4. Extract from/load to where?  Over 600 components  Over 450 connectors  Allows multiple inputs/outputs in single job
  • 5. Connectors  Flat files  Applications/Platforms  Delimted (tab, CSV…)  Alfresco  XML  Microsoft Dynamics  JSON  Excel  Positional  Apache HTTP logs, HL7... (CRM, AX)  SAP  Sage ERP X3  Salesforce  SugarCRM
  • 6. Connectors (continued)  Relational Databases  MySQL  Postgresql  MS SQL  Oracle  Many more  NoSQL/Columnar/OLAP/ Other  Amazon RedShift  Greenplum  Hive  OLAP cubes  LDAP  VectorWise  Teradata  More in Big Data ed.
  • 7. How do we transport data?  File system  FTP  SFTP/SCP  Web service (SOAP, REST)  HTTP  Mail, POP  XMLRPC, Sockets, JMS, RSS...
  • 8. Other Components  Process data: join, filter, aggregate  Flow control: loops, job invocation  Logs, statistics  Code: Java, Groovy  On row data or standalone  Can load libraries
  • 10. Nifty Components  FuzzyMatch - calculate Levenshtein distance or phonetic similarity  IntervalMatch – perform lookup/join based on values falling within an interval  Replace, ReplaceList - search and replace, substitution  UniqRow - output distinct rows based on defined key columns
  • 11. More Nifty Components  XMLMap - Allows joins, column or row filtering, transformations, and multiple outputs  Normalize/Denormalize - split delimited strings into columns or join columns into a string  AggregateRow – GROUP BY; min, max, sum, other functions used to aggregate rows on a column
  • 12. Tips and Tricks  CamelCase job names for embedded jobs.  Or prefix with ETL phase and order of execution  Whenever appropriate (esp. for inserting data), use the schema from the repository.  When connecting, propagating changes to a DB component will change it to a built-in schema, which won't get updated.
  • 13. Tips and Tricks  Propagating changes to a DB component will change it to a built-in schema, which won't get updated after repo changes.  On the other hand, remember that for lookup/join (i.e., SELECT) queries you can modify the query to only select the fields you need. Propagating the schema is useful then.
  • 14. Tips and Tricks  Failure handling subjob:  It’s an unconnected job (no triggers point to it)  Use LogCatcher to catch, record component failures.  Record failure in DB, file, email, etc.  Add rollback component to undo DB changes if necessary. May need to do this in the job if strategic placement is needed.
  • 15. Tips and Tricks  In Java expressions, use methods, not operators. E.g., concat(String) instead of the dot operator, equals(Object) instead of ==.  Technical components (like hash maps) are hidden by default. See: http://www.talendforge.org/forum/viewtopic.p hp?pid=110860
  • 16. Tips and Tricks  When connecting, propagating changes to a DB component will change it to a built-in schema, which won't get updated after repo changes.  On the other hand, remember that for lookup/join (i.e., SELECT) queries you can modify the query to only select the fields you need. Propagating the schema is useful then.
  • 17. Tips and Tricks  Use a context for job variables.  Note you can specify type for variables.  You can read from a file or database, or pass in a context if an embedded Java job.
  • 18. Tips and Tricks  For multi-host deployment:  Export the job with a “bootstrap” context that has all variables, but populates only a context config location that is the same for all machines.  The context config file has all values required for that host, e.g. test DB connection for test machine.  You can rely on the fact that Windows will interpret root as the main system drive, so “/Data/” will translate to C:Data  Be mindful of file permissions for sensitive context data (e.g., DB password)
  • 19. Tips and Tricks  Use “Bulk” output components when possible.  For transactional behavior:  Start the job with DB connection  Check “use existing connection” in all relevant components  Check "Die on error" in all relevant components  End job with commit component
  • 20. Room for Improvement  UI stability  Documentation
  • 21. Books  Getting Started with Talend Open Studio for Data Integration by Bowen Jonathan  Talend Open Studio Cookbook by Rick Daniel Barton  Big Data book coming…
  • 22. Talend Forge  http://www.talendforge.org/  Forum – super helpful  Exchange – free community components!  Tutorials  Bug tracker  Source code
  • 23. Talend Resources  http://www.talend.com/resources  Help Center  Knowledge Base  Webinars, screencasts  Tutorials  Docs are on download page  And by pressing F1 on a component
  • 24. Questions? Compliments? Consulting gigs?  Contact me:  philip@yurchuk.com  http://philip.yurchuk.com  http://www.linkedin.com/in/philipyurchuk/