SlideShare a Scribd company logo
1 of 61
Download to read offline
Scalable ETL with Talend and Hadoop
Talend,	
  Global	
  Leader	
  in	
  Open	
  Source	
  Integra7on	
  Solu7ons	
  
Cédric Carbone – Talend CTO
Twitter : @carbone
ccarbone@talend.com
Why speaking about ETL with Hadoop
Hadoop	
  is	
  complex	
  
BI	
  consultants	
  don’t	
  have	
  the	
  skill	
  to	
  manipulate	
  Hadoop	
  
➜  Biggest	
  issue	
  in	
  Hadoop	
  project	
  is	
  to	
  find	
  skilled	
  Hadoop	
  
Engineers	
  

➜ 

ETL	
  tool	
  like	
  Talend	
  can	
  help	
  to	
  democra7ze	
  Hadoop	
  
Trying	
  to	
  get	
  from	
  this…	
  
to	
  this…	
  

Why Talend…
Talend generates code that is executed within map reduce. This open
approach removes the limitation of a proprietary “engine” to provide a truly
unique and powerful set of tools for big data.
BIG Data Management
Big	
  Data	
  	
  
Produc7on	
  
RDBMS	
  
Analy7cal	
  DB	
  
NoSQL	
  DB	
  
ERP/CRM	
  
SaaS	
  
Social	
  Media	
  
Web	
  Analy7cs	
  
Log	
  Files	
  
RFID	
  
Call	
  Data	
  Records	
  
Sensors	
  
Machine-­‐Generated	
  

Big	
  Data	
  Management	
  
Big	
  Data	
  	
  
Integra7on	
  

Big Data
Quality

Big	
  Data	
  	
  
Consump7on	
  
Mining	
  
Analy7cs	
  

Storage
Processing
Filtering

Parsing
Checking

Search	
  
Enrichment	
  

Turn Big Data into
actionable information
Two methods for inserting data quality into a
big data job

1. 

Pipelining:	
  as	
  part	
  of	
  the	
  load	
  process	
  

2. 

Load	
  the	
  cluster	
  than	
  implement	
  and	
  execute	
  a	
  data	
  
quality	
  map	
  reduce	
  job	
  
 
	
  
	
  
	
  
	
  
	
  
Extract	
  –	
  Transform	
  -­‐	
  Load	
  

E-­‐T-­‐L
 
	
  
	
  
	
  
	
  
	
   	
  
	
  
Extract	
  –	
  Improve/Cleanse	
  -­‐	
  Load	
  

E-­‐	
  	
  	
  	
  	
  	
  	
  -­‐L
DQ
Pipelining: data quality with big data

CRM

ERP

Finance

	
  

DQ

	
  

DQ
DQ

	
  
Big Data

DQ
Social
Networking

Mobile Devices

DQ

•  Use	
  tradi7onal	
  data	
  quality	
  tools	
  
•  Once	
  and	
  done	
  
Big data alternative: Load and improve within
the cluster
	
  

CRM

DQ
ERP

	
  

DQ
	
  

Finance

Social
Networking

Mobile Devices

Big Data

•  Load	
  first,	
  improve	
  later	
  
•  Complex	
  matching	
  cannot	
  be	
  done	
  outside	
  
One key DQ rules: Match
Find	
  duplicates	
  within	
  Hadoop	
  
➜  Today’s	
  matching	
  algorithms	
  are	
  processor-­‐intensive	
  
➜  Tomorrow’s	
  matching	
  algorithms	
  could	
  be	
  more	
  
precise,	
  more	
  intensive	
  
➜ 
What is Hadoop?	
  
What’s hadoop
➜ 

➜ 

➜ 

➜ 

The	
  Apache™	
  Hadoop®	
  project	
  develops	
  open-­‐source	
  
so^ware	
  for	
  reliable,	
  scalable,	
  distributed	
  compu7ng.	
  
Java	
  framework	
  for	
  storage	
  and	
  running	
  data	
  
transforma7on	
  on	
  large	
  cluster	
  of	
  commodity	
  
hardware	
  
Licensed	
  under	
  the	
  Apache	
  v2	
  license	
  
Created	
  from	
  Google's	
  MapReduce,	
  BigTable	
  and	
  
Google	
  File	
  System	
  (GFS)	
  papers	
  
Hadoop ecosystem

Sqoop	
  
➜  Pig	
  
➜  Hive	
  
➜  Oozie	
  
➜  HCatalog	
  
➜  Mahout	
  
➜ 

➜ 

HBase	
  
Talend for Big Data : Hadoop Story
4.0:	
  [April	
  2010]	
  
Put	
  or	
  get	
  data	
  into	
  
Haddop	
  through	
  
HDFS	
  connectors	
  

4.1:	
  	
  [Oct	
  2010]	
  
Hadoop	
  Query	
  (Hive)	
  	
  
Bulk	
  loald	
  &	
  fast	
  export	
  to	
  
Hadoop	
  (Sqoop)	
  

5.1:	
  [May	
  2012]	
  
Metadata	
  (Hcatalog)	
  
Deployement	
  &	
  Scheduling	
  (Oozie)	
  
Embeded	
  into	
  HDP	
  

HCatalog

	
  
	
  

4.2:	
  [May	
  2011]	
  
Transforma7on	
  
(Pig)	
  

5.2:	
  [Oct	
  2012]	
  
Visual	
  ELT	
  mapping	
  (Hive)	
  
DataLineage	
  &	
  Impact	
  Analysis	
  
	
  

5.0:	
  [Nov	
  2011]	
  
Hbase	
  NoSQL.	
  
Extend	
  our	
  tPig*	
  

5.3	
  -­‐	
  [June	
  2013]	
  
Visual	
  Pig	
  mapping	
  
Machine	
  Learning	
  (Mahout)	
  
Na7ve	
  MapReduce	
  Code	
  Gen	
  
	
  
Democratizing Integration with Data
Integration tools for Big Data	
  
WordCount

➜ 

WordCount	
  
•  Comes	
  with	
  Hadoop	
  
•  “First”	
  demo	
  that	
  everyone	
  tries!	
  

➜ 

How-­‐to	
  in	
  Talend	
  Big	
  Data	
  
•  Simple	
  read,	
  count,	
  load	
  results	
  
•  No	
  coding,	
  just	
  drag-­‐n-­‐drop	
  
•  Runs	
  remotely	
  
Map Reduce

Data	
  Node	
  1	
  

Data	
  Node	
  2	
  
Map Reduce
Map Reduce
Map Reduce
Map Reduce
In Java
WordCount: Howto with a Graphical ETL
Thank You!
	
  
	
  
	
  
Let	
  us	
  show	
  you…	
  
Talend Open Studio
Generate Pure Map Reduce
Pig Latin Script generation

FOREACH	
  tPigMap_1_out1_RESULT	
  GENERATE	
  $4	
  AS	
  
Revenu	
  ,	
  $6	
  AS	
  Label	
  
HiveQL generation
HDFS Management and Sqoop
Apache Mahout
Big	
  Data	
  can	
  also	
  be	
  a	
  blob	
  of	
  
data	
  to	
  an	
  organiza7on	
  
➜  Apache	
  Mahout	
  provides	
  
algorithms	
  to	
  understanding	
  
data	
  –	
  data	
  mining	
  
➜  “You	
  don’t	
  know	
  what	
  you	
  
don’t	
  know.”	
  and	
  mahout	
  will	
  
tell	
  you.	
  
➜ 
Metadata Management
➜ 

Centralize	
  Metadata	
  repository	
  for	
  Hadoop	
  Cluster,	
  
HDFS,	
  Hive…	
  
•  Versioning	
  
•  Impact	
  Analysis	
  and	
  Data	
  Lineage	
  	
  

➜ 

HCatalog	
  accros	
  	
  
•  HDFS	
  
•  Hive	
  
•  Pig	
  
Thank You!
	
  
	
  
	
  
Choose your Hadoop distro
Widely	
  adopted	
  
➜  Management	
  tooling	
  is	
  not	
  OSS	
  
➜ 

Fully	
  OpenSource	
  
➜  Strong	
  Developer	
  ecosystem	
  
➜ 

More	
  proprietary	
  
➜  GTM	
  partner	
  with	
  AWS	
  
➜ 

➜ 

A	
  lot	
  of	
  more	
  are	
  comming	
  
Choose your Hadoop distro
Provide	
  tooling	
  :	
  
➜  For	
  installa7on	
  
➜  For	
  server	
  monitoring	
  
	
  
But	
  
➜  No	
  GUI	
  for	
  parsing,	
  
transforming,	
  easily	
  
loading.	
  No	
  data	
  
management	
  	
  
Parse and Standardize
Big	
  Data	
  is	
  not	
  always	
  structured	
  
➜  Correct	
  big	
  data	
  so	
  that	
  data	
  conforms	
  to	
  the	
  same	
  
rules	
  
➜ 
Profiling & Monitor DQ
Implications for Integration
DATA
Challenge:	
  Informa7on	
  explosion	
  increases	
  
complexity	
  of	
  integra7on	
  and	
  requires	
  
governance	
  to	
  maintain	
  data	
  quality	
  
	
  
Requirement:	
  Informa7on	
  processing	
  	
  
must	
  scale	
  
Implications for Integration
APPLICATION
	
  	
  

Challenge:	
  Brisle,	
  point-­‐to-­‐point	
  
connec7ons	
  cannot	
  adapt	
  to	
  evolving	
  
business	
  requirements,	
  new	
  channels,	
  
and	
  quickly	
  changing	
  topologies	
  
	
  
Requirement:	
  Applica7on	
  architecture	
  
must	
  scale	
  
	
  
Implications for Integration
PROCESS
	
  

Challenge:	
  Compe77ve	
  market	
  forces	
  drive	
  
frequent	
  process	
  changes	
  and	
  increased	
  
process	
  complexity	
  
	
  

Requirement: Business	
  processes	
  
must	
  scale	
  
Implications for Integration
Challenge:	
  Interdependencies	
  
across	
  data,	
  applica7ons	
  and	
  
processes	
  require	
  more	
  
resources	
  and	
  budget	
  
	
  
Requirement:	
  Resources	
  and	
  
skillsets	
  must	
  scale	
  
	
  
	
  
Integration at Any Scale
True
scalability for
•  Any	
  integra7on	
  challenge	
  
•  Any	
  data	
  volume	
  
•  Any	
  project	
  size	
  

Enables integration
convergence
	
  
Technology that Scales

Java	
  

STANDARDS-BASED
Easy	
  to	
  learn,	
  flexible	
  to	
  adopt,	
  
reduces	
  vendor	
  lock-­‐in	
  

SQL	
  

Camel	
  

Map	
  
Reduce	
  

……	
  

CODE GENERATOR
No	
  black-­‐box	
  engine	
  means	
  faster	
  
maintenance	
  and	
  deployment	
  
with	
  improved	
  quality	
  
	
  
The	
  “engine”	
  for	
  Big	
  Data	
  is	
  
“Hadoop”,	
  making	
  it	
  uniquely	
  run	
  
at	
  infinite	
  scale.	
  
Technology Continuum…

Google Big Query

	
  	
  
ELT (SQL CodeGen)
•  Terradata
	
  	
  
•  Netezza
•  Vertica

	
  

	
  	
  

	
  
•  Visual Wizard
•  Hive/Pig/MR CodeGen
•  HDFS, Sqoop, Oozie…

	
  	
  

ETL 	
  	
  
•  Java Code
•  Partionning
•  Paralelisation
NoSQL
•  MongoDB
•  Neo4J
•  Cassandra
•  Hbase
•  Amazon Redshift
Talend Overview
At a glance

Talend	
  today	
  
➜ 

400	
  employees	
  in	
  7	
  countries	
  with	
  dual	
  HQ	
  in	
  Los	
  Altos,	
  CA	
  
and	
  Paris,	
  France	
  

➜ 

Over	
  4,000	
  paying	
  customers	
  across	
  different	
  industry	
  
ver7cals	
  and	
  company	
  sizes	
  
Backed	
  by	
  Silver	
  Lake	
  Sumeru,	
  Balderton	
  Capital	
  and	
  
Idinvest	
  Partners	
  

▶  Founded	
  in	
  2005	
  
▶  Offers	
  highly	
  scalable	
  
integra7on	
  solu7ons	
  
addressing	
  Data	
  Integra7on,	
  
Data	
  Quality,	
  MDM,	
  ESB	
  and	
  
BPM	
  
▶  Provides:	
  	
  

➜ 

High	
  growth	
  through	
  a	
  proven	
  model	
  

§  Subscrip7ons	
  including	
  
24/7	
  support	
  and	
  
indemnifica7on;	
  	
  

Brand	
  	
  
Awareness	
  
20	
  million	
  
Downloads	
  

§  Worldwide	
  training	
  and	
  
services	
  
▶  Recognized	
  as	
  the	
  open	
  source	
  
leader	
  in	
  each	
  of	
  its	
  market	
  
categories	
  

Market	
  
Momentum	
  

AdopDon	
  
1,000,000	
  
Users	
  

+50	
  New	
  
Customers	
  /	
  
Month	
  

MoneDzaDon	
  
4,000	
  
Customers	
  
Talend’s Unique Integration Solution
Best-ofBreed
Solutions
Data
Quality

Data
Integration

MDM

ESB

BPM

+
Studio	
  

Repository	
  

Deployment	
  

Execu7on	
  

Monitoring	
  

Talend
Unified
Single	
  web-­‐based	
  
Web-­‐based	
  
Comprehensive	
  
Platform
monitoring	
  console	
  
Eclipse-­‐based	
  	
  
deployment	
  &	
  
➜  Reduce	
  costs	
  
Talend
scheduling	
  
user	
  interface	
  
➜  Eliminate	
  risk	
  
5
Unified =
1
3 Same	
  container	
  for	
  
Consolidated	
  
➜  Reuse	
  skills	
  
Platform
metadata	
  &	
  project	
  
batch	
  processing,	
  
Unique
➜  Economies	
  of	
  scale	
  
informa7on	
  
message	
  rou7ng	
  &	
  
Integration
services	
  
➜  Incremental	
  adop7on	
  
2
Solution
4
Recognized	
  as	
  the	
  open	
  source	
  leader	
  in	
  each	
  of	
  its	
  market	
  category	
  by	
  
all	
  industry	
  analysts	
  
Solutions that Scale

Big	
  Data	
  
	
  
	
  
	
  

Studio	
  

Data	
  
	
  Quality	
  
	
  
	
  
	
  

Repository	
  

Data	
  
	
  Integra7on	
  
	
  
	
  
	
  

Data	
  	
  
Integra7on	
  

	
  
MDM	
  
ESB	
  
BPM	
  
	
  
	
  
	
  
	
  
TALEND UNIFIED 	
  PLATFORM
	
  
	
  
	
  
	
  
	
  

Deployment	
  

	
  

Execu7on	
  

Monitoring	
  

UNIFIED PLATFORM
A	
  shared	
  founda7on	
  and	
  toolset	
  increases	
  resource	
  reuse	
  

CONVERGED INTEGRATION
Use	
  for	
  any	
  data,	
  applica7on	
  and	
  	
  
process	
  project	
  
The 6 Dimensions of BIG Data
Primary	
  challenges	
  
¾ Volume	
  
¾ Velocity	
  
¾ Variety	
  
	
  
And	
  also	
  

¾ Complexity	
  
¾ Valida7on	
  
¾ Lineage	
  
What Is BIG Data?
"Big	
  data"	
  	
  
is	
  informa7on	
  	
  
of	
  extreme	
  size,	
  
diversity,	
  complexity	
  
and	
  need	
  for	
  rapid	
  
processing.	
  
Ted Friedman - Information
Infrastructure and Big Data Projects
Key Initiative Overview - July 2011

3,500	
  tweets	
  	
  
per	
  second	
  
(June	
  2011)	
  

1,000,000	
  transac7ons	
  
per	
  day	
  at	
  Walmart	
  

200	
  billion	
  
intelligent	
  devices	
  
200,000,000,000	
  

2015

275	
  exabytes	
  
of	
  data	
  flowing	
  over	
  	
  
the	
  Internet	
  each	
  day	
  
275,000,000,000,000,000,000	
  

2020
What is Big Data?	
  
How to
define Big
data is….
Hans	
  Rosling	
  –	
  uses	
  big	
  data	
  to	
  analyze	
  world	
  health	
  trends	
  

Key	
  Takeaway	
  #1	
  

volume, variety, velocity
Traditional Data Flows

CRM	
  

ERP	
  

ETL	
  
Data	
  Quality	
  

Normalized	
  
Data	
  

Tradi7onal	
  Data	
  
Warehouse	
  

Finance	
  

•  Scheduled–daily	
  or	
  weekly,	
  
some7mes	
  more	
  frequently.	
  	
  
•  Volumes	
  rarely	
  exceed	
  
terabytes	
  

Business	
  
Analyst	
  
Warehouse	
  
Administrator	
  
	
  

Business	
  
User	
  

Execu7ves	
  
The new world of big data

CRM	
  

Social	
  Networking	
  

	
  
	
  

ERP	
  

Finance	
  

Big Data
The new world of big data

Social	
  Networking	
  

	
  

CRM	
  

	
  
ERP	
  

Finance	
  

	
  
Big Data

Mobile	
  Devices	
  
The new world of big data

Social	
  Networking	
  

	
  

CRM	
  

	
  
ERP	
  

Mobile	
  Devices	
  

	
  

Transac7ons	
  

	
  

Finance	
  

Network	
  Devices	
  

	
  
Big Data

Sensors	
  
Key	
  Takeaway	
  #2	
  

Forces us to think
differently
Data driven business
enables
data

governance	
  

information

supports

Information provides
value to the business
If	
  you	
  can't	
  rely	
  on	
  your	
  informa7on	
  then	
  the	
  
result	
  can	
  be	
  missed	
  opportuni7es,	
  or	
  higher	
  
costs.	
  
Mashew	
  West	
  and	
  Julian	
  Fowler	
  (1999).	
  Developing	
  High	
  Quality	
  Data	
  Models.	
  	
  
The	
  European	
  Process	
  Industries	
  STEP	
  Technical	
  Liaison	
  Execu7ve	
  (EPISTLE).	
  

decisions

drives

Your business
BIG data driven business

BIG	
  data	
  

enables
governance	
  

BIG	
  

supports

information

Information provides
value to the business
If	
  you	
  can't	
  rely	
  on	
  your	
  informa7on	
  then	
  the	
  
result	
  can	
  be	
  missed	
  opportuni7es,	
  or	
  higher	
  
costs.	
  
Mashew	
  West	
  and	
  Julian	
  Fowler	
  (1999).	
  Developing	
  High	
  Quality	
  Data	
  Models.	
  	
  
The	
  European	
  Process	
  Industries	
  STEP	
  Technical	
  Liaison	
  Execu7ve	
  (EPISTLE).	
  

BIG	
  

decisions

drives

BIG	
  

business
How is big data integration being used?
Use	
  Cases	
  
• 
• 
• 
• 
• 
• 
• 
• 
• 
• 
	
  
	
  

Recommenda7on	
  Engine	
  
Sen7ment	
  Analysis	
  
Risk	
  Modeling	
  
Fraud	
  Detec7on	
  
Behavior	
  Analysis	
  
Marke7ng	
  Campaign	
  Analysis	
  
Customer	
  Churn	
  Analysis	
  
Social	
  Graph	
  Analysis	
  
Customer	
  Experience	
  Analy7cs	
  
Network	
  Monitoring	
  

	
  

	
  
	
  
	
  

BUT:	
  to	
  what	
  level	
  is	
  DQ	
  required	
  for	
  your	
  use	
  case?	
  
Key	
  Takeaway	
  #3	
  

Define your use case

More Related Content

What's hot

Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)
Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)
Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)Rittman Analytics
 
Tapping into the Big Data Reservoir (CON7934)
Tapping into the Big Data Reservoir (CON7934)Tapping into the Big Data Reservoir (CON7934)
Tapping into the Big Data Reservoir (CON7934)Jeffrey T. Pollock
 
2010.03.16 Pollock.Edw2010.Modern D Ifor Warehousing
2010.03.16 Pollock.Edw2010.Modern D Ifor Warehousing2010.03.16 Pollock.Edw2010.Modern D Ifor Warehousing
2010.03.16 Pollock.Edw2010.Modern D Ifor WarehousingJeffrey T. Pollock
 
Talend Summer '17 Release: New Features and Tech Overview
Talend Summer '17 Release: New Features and Tech OverviewTalend Summer '17 Release: New Features and Tech Overview
Talend Summer '17 Release: New Features and Tech OverviewTalend
 
Talend Introduction by TSI
Talend Introduction by TSITalend Introduction by TSI
Talend Introduction by TSIRemain Software
 
Oracle Data Integration CON9737 at OpenWorld
Oracle Data Integration CON9737 at OpenWorldOracle Data Integration CON9737 at OpenWorld
Oracle Data Integration CON9737 at OpenWorldJeffrey T. Pollock
 
YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez Hortonworks
 
Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration Hortonworks
 
Oracle Data Integration - Overview
Oracle Data Integration - OverviewOracle Data Integration - Overview
Oracle Data Integration - OverviewJeffrey T. Pollock
 
Evolving Hadoop into an Operational Platform with Data Applications
Evolving Hadoop into an Operational Platform with Data ApplicationsEvolving Hadoop into an Operational Platform with Data Applications
Evolving Hadoop into an Operational Platform with Data ApplicationsDataWorks Summit
 
Intelligent Integration OOW2017 - Jeff Pollock
Intelligent Integration OOW2017 - Jeff PollockIntelligent Integration OOW2017 - Jeff Pollock
Intelligent Integration OOW2017 - Jeff PollockJeffrey T. Pollock
 
The convergence of reporting and interactive BI on Hadoop
The convergence of reporting and interactive BI on HadoopThe convergence of reporting and interactive BI on Hadoop
The convergence of reporting and interactive BI on HadoopDataWorks Summit
 
Accelerating query processing with materialized views in Apache Hive
Accelerating query processing with materialized views in Apache HiveAccelerating query processing with materialized views in Apache Hive
Accelerating query processing with materialized views in Apache HiveDataWorks Summit
 
Talend Open Studio Data Integration
Talend Open Studio Data IntegrationTalend Open Studio Data Integration
Talend Open Studio Data IntegrationRoberto Marchetto
 
Oracle PL/SQL 12c and 18c New Features + RADstack + Community Sites
Oracle PL/SQL 12c and 18c New Features + RADstack + Community SitesOracle PL/SQL 12c and 18c New Features + RADstack + Community Sites
Oracle PL/SQL 12c and 18c New Features + RADstack + Community SitesSteven Feuerstein
 
Talend Data Integration Tutorial | Talend Tutorial For Beginners | Talend Onl...
Talend Data Integration Tutorial | Talend Tutorial For Beginners | Talend Onl...Talend Data Integration Tutorial | Talend Tutorial For Beginners | Talend Onl...
Talend Data Integration Tutorial | Talend Tutorial For Beginners | Talend Onl...Edureka!
 
Presentacin webinar move_up_to_power8_with_scale_out_servers_final
Presentacin webinar move_up_to_power8_with_scale_out_servers_finalPresentacin webinar move_up_to_power8_with_scale_out_servers_final
Presentacin webinar move_up_to_power8_with_scale_out_servers_finalDiego Alberto Tamayo
 
Data Lake for the Cloud: Extending your Hadoop Implementation
Data Lake for the Cloud: Extending your Hadoop ImplementationData Lake for the Cloud: Extending your Hadoop Implementation
Data Lake for the Cloud: Extending your Hadoop ImplementationHortonworks
 
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...Hortonworks
 

What's hot (20)

Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)
Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)
Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)
 
Tapping into the Big Data Reservoir (CON7934)
Tapping into the Big Data Reservoir (CON7934)Tapping into the Big Data Reservoir (CON7934)
Tapping into the Big Data Reservoir (CON7934)
 
2010.03.16 Pollock.Edw2010.Modern D Ifor Warehousing
2010.03.16 Pollock.Edw2010.Modern D Ifor Warehousing2010.03.16 Pollock.Edw2010.Modern D Ifor Warehousing
2010.03.16 Pollock.Edw2010.Modern D Ifor Warehousing
 
Talend Summer '17 Release: New Features and Tech Overview
Talend Summer '17 Release: New Features and Tech OverviewTalend Summer '17 Release: New Features and Tech Overview
Talend Summer '17 Release: New Features and Tech Overview
 
Talend Introduction by TSI
Talend Introduction by TSITalend Introduction by TSI
Talend Introduction by TSI
 
Oracle Data Integration CON9737 at OpenWorld
Oracle Data Integration CON9737 at OpenWorldOracle Data Integration CON9737 at OpenWorld
Oracle Data Integration CON9737 at OpenWorld
 
YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez
 
Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration
 
Talend for big_data_intorduction
Talend for big_data_intorductionTalend for big_data_intorduction
Talend for big_data_intorduction
 
Oracle Data Integration - Overview
Oracle Data Integration - OverviewOracle Data Integration - Overview
Oracle Data Integration - Overview
 
Evolving Hadoop into an Operational Platform with Data Applications
Evolving Hadoop into an Operational Platform with Data ApplicationsEvolving Hadoop into an Operational Platform with Data Applications
Evolving Hadoop into an Operational Platform with Data Applications
 
Intelligent Integration OOW2017 - Jeff Pollock
Intelligent Integration OOW2017 - Jeff PollockIntelligent Integration OOW2017 - Jeff Pollock
Intelligent Integration OOW2017 - Jeff Pollock
 
The convergence of reporting and interactive BI on Hadoop
The convergence of reporting and interactive BI on HadoopThe convergence of reporting and interactive BI on Hadoop
The convergence of reporting and interactive BI on Hadoop
 
Accelerating query processing with materialized views in Apache Hive
Accelerating query processing with materialized views in Apache HiveAccelerating query processing with materialized views in Apache Hive
Accelerating query processing with materialized views in Apache Hive
 
Talend Open Studio Data Integration
Talend Open Studio Data IntegrationTalend Open Studio Data Integration
Talend Open Studio Data Integration
 
Oracle PL/SQL 12c and 18c New Features + RADstack + Community Sites
Oracle PL/SQL 12c and 18c New Features + RADstack + Community SitesOracle PL/SQL 12c and 18c New Features + RADstack + Community Sites
Oracle PL/SQL 12c and 18c New Features + RADstack + Community Sites
 
Talend Data Integration Tutorial | Talend Tutorial For Beginners | Talend Onl...
Talend Data Integration Tutorial | Talend Tutorial For Beginners | Talend Onl...Talend Data Integration Tutorial | Talend Tutorial For Beginners | Talend Onl...
Talend Data Integration Tutorial | Talend Tutorial For Beginners | Talend Onl...
 
Presentacin webinar move_up_to_power8_with_scale_out_servers_final
Presentacin webinar move_up_to_power8_with_scale_out_servers_finalPresentacin webinar move_up_to_power8_with_scale_out_servers_final
Presentacin webinar move_up_to_power8_with_scale_out_servers_final
 
Data Lake for the Cloud: Extending your Hadoop Implementation
Data Lake for the Cloud: Extending your Hadoop ImplementationData Lake for the Cloud: Extending your Hadoop Implementation
Data Lake for the Cloud: Extending your Hadoop Implementation
 
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
 

Viewers also liked

Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...Hortonworks
 
Etl with talend (data integeration)
Etl with talend (data integeration)Etl with talend (data integeration)
Etl with talend (data integeration)pomishra
 
Big Data Expo 2015 - Talend Delivering Real Time
Big Data Expo 2015 - Talend Delivering Real TimeBig Data Expo 2015 - Talend Delivering Real Time
Big Data Expo 2015 - Talend Delivering Real TimeBigDataExpo
 
How to create intelligent Business Processes thanks to Big Data (BPM, Apache ...
How to create intelligent Business Processes thanks to Big Data (BPM, Apache ...How to create intelligent Business Processes thanks to Big Data (BPM, Apache ...
How to create intelligent Business Processes thanks to Big Data (BPM, Apache ...Kai Wähner
 
Talend Open Studio for Big Data (powered by Apache Hadoop)
Talend Open Studio for Big Data (powered by Apache Hadoop)Talend Open Studio for Big Data (powered by Apache Hadoop)
Talend Open Studio for Big Data (powered by Apache Hadoop)Modern Data Stack France
 
Simplifying Big Data ETL with Talend
Simplifying Big Data ETL with TalendSimplifying Big Data ETL with Talend
Simplifying Big Data ETL with TalendEdureka!
 
Talend Open Studio Fundamentals #1: Workspaces, Jobs, Metadata and Trips & Tr...
Talend Open Studio Fundamentals #1: Workspaces, Jobs, Metadata and Trips & Tr...Talend Open Studio Fundamentals #1: Workspaces, Jobs, Metadata and Trips & Tr...
Talend Open Studio Fundamentals #1: Workspaces, Jobs, Metadata and Trips & Tr...Gabriele Baldassarre
 
Setting the Record Straight: Drupal as an Enterprise Web Content Management S...
Setting the Record Straight: Drupal as an Enterprise Web Content Management S...Setting the Record Straight: Drupal as an Enterprise Web Content Management S...
Setting the Record Straight: Drupal as an Enterprise Web Content Management S...Acquia
 
Intro to Talend Open Studio for Data Integration
Intro to Talend Open Studio for Data IntegrationIntro to Talend Open Studio for Data Integration
Intro to Talend Open Studio for Data IntegrationPhilip Yurchuk
 
Open Source ETL using Talend Open Studio
Open Source ETL using Talend Open StudioOpen Source ETL using Talend Open Studio
Open Source ETL using Talend Open Studiosantosluis87
 
Powering Fast Data and the Hadoop Ecosystem with VoltDB and Hortonworks
Powering Fast Data and the Hadoop Ecosystem with VoltDB and HortonworksPowering Fast Data and the Hadoop Ecosystem with VoltDB and Hortonworks
Powering Fast Data and the Hadoop Ecosystem with VoltDB and HortonworksHortonworks
 
Database vs Data Warehouse: A Comparative Review
Database vs Data Warehouse: A Comparative ReviewDatabase vs Data Warehouse: A Comparative Review
Database vs Data Warehouse: A Comparative ReviewHealth Catalyst
 
No-Ops で大量データ処理基盤を簡単に実現する
No-Ops で大量データ処理基盤を簡単に実現するNo-Ops で大量データ処理基盤を簡単に実現する
No-Ops で大量データ処理基盤を簡単に実現するKiyoshi Fukuda
 
Implementing a Data Lake with Enterprise Grade Data Governance
Implementing a Data Lake with Enterprise Grade Data GovernanceImplementing a Data Lake with Enterprise Grade Data Governance
Implementing a Data Lake with Enterprise Grade Data GovernanceHortonworks
 

Viewers also liked (20)

Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
 
Etl with talend (data integeration)
Etl with talend (data integeration)Etl with talend (data integeration)
Etl with talend (data integeration)
 
Big Data Expo 2015 - Talend Delivering Real Time
Big Data Expo 2015 - Talend Delivering Real TimeBig Data Expo 2015 - Talend Delivering Real Time
Big Data Expo 2015 - Talend Delivering Real Time
 
How to create intelligent Business Processes thanks to Big Data (BPM, Apache ...
How to create intelligent Business Processes thanks to Big Data (BPM, Apache ...How to create intelligent Business Processes thanks to Big Data (BPM, Apache ...
How to create intelligent Business Processes thanks to Big Data (BPM, Apache ...
 
Talend Open Studio for Big Data (powered by Apache Hadoop)
Talend Open Studio for Big Data (powered by Apache Hadoop)Talend Open Studio for Big Data (powered by Apache Hadoop)
Talend Open Studio for Big Data (powered by Apache Hadoop)
 
Morning at Lohika
Morning at LohikaMorning at Lohika
Morning at Lohika
 
Simplifying Big Data ETL with Talend
Simplifying Big Data ETL with TalendSimplifying Big Data ETL with Talend
Simplifying Big Data ETL with Talend
 
Talend Open Studio Fundamentals #1: Workspaces, Jobs, Metadata and Trips & Tr...
Talend Open Studio Fundamentals #1: Workspaces, Jobs, Metadata and Trips & Tr...Talend Open Studio Fundamentals #1: Workspaces, Jobs, Metadata and Trips & Tr...
Talend Open Studio Fundamentals #1: Workspaces, Jobs, Metadata and Trips & Tr...
 
Setting the Record Straight: Drupal as an Enterprise Web Content Management S...
Setting the Record Straight: Drupal as an Enterprise Web Content Management S...Setting the Record Straight: Drupal as an Enterprise Web Content Management S...
Setting the Record Straight: Drupal as an Enterprise Web Content Management S...
 
Intro to Talend Open Studio for Data Integration
Intro to Talend Open Studio for Data IntegrationIntro to Talend Open Studio for Data Integration
Intro to Talend Open Studio for Data Integration
 
Security of DNS
Security of DNSSecurity of DNS
Security of DNS
 
planning & project management for DWH
planning & project management for DWHplanning & project management for DWH
planning & project management for DWH
 
Open Source ETL using Talend Open Studio
Open Source ETL using Talend Open StudioOpen Source ETL using Talend Open Studio
Open Source ETL using Talend Open Studio
 
Powering Fast Data and the Hadoop Ecosystem with VoltDB and Hortonworks
Powering Fast Data and the Hadoop Ecosystem with VoltDB and HortonworksPowering Fast Data and the Hadoop Ecosystem with VoltDB and Hortonworks
Powering Fast Data and the Hadoop Ecosystem with VoltDB and Hortonworks
 
Macro skills in learning
Macro skills in learningMacro skills in learning
Macro skills in learning
 
Database vs Data Warehouse: A Comparative Review
Database vs Data Warehouse: A Comparative ReviewDatabase vs Data Warehouse: A Comparative Review
Database vs Data Warehouse: A Comparative Review
 
No-Ops で大量データ処理基盤を簡単に実現する
No-Ops で大量データ処理基盤を簡単に実現するNo-Ops で大量データ処理基盤を簡単に実現する
No-Ops で大量データ処理基盤を簡単に実現する
 
Implementing a Data Lake with Enterprise Grade Data Governance
Implementing a Data Lake with Enterprise Grade Data GovernanceImplementing a Data Lake with Enterprise Grade Data Governance
Implementing a Data Lake with Enterprise Grade Data Governance
 
Effective Corporate Trainer Skills PPT - Training and Development
Effective Corporate Trainer Skills PPT - Training and DevelopmentEffective Corporate Trainer Skills PPT - Training and Development
Effective Corporate Trainer Skills PPT - Training and Development
 
Tech talk: PHP
Tech talk: PHPTech talk: PHP
Tech talk: PHP
 

Similar to Scalable ETL with Talend and Hadoop, Cédric Carbone, Talend.

VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...VMworld
 
Vmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps IronfanVmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps IronfanJim Kaskade
 
Architecting the Future of Big Data and Search
Architecting the Future of Big Data and SearchArchitecting the Future of Big Data and Search
Architecting the Future of Big Data and SearchHortonworks
 
2015 02 12 talend hortonworks webinar challenges to hadoop adoption
2015 02 12 talend hortonworks webinar challenges to hadoop adoption2015 02 12 talend hortonworks webinar challenges to hadoop adoption
2015 02 12 talend hortonworks webinar challenges to hadoop adoptionHortonworks
 
Summer Shorts: Big Data Integration
Summer Shorts: Big Data IntegrationSummer Shorts: Big Data Integration
Summer Shorts: Big Data Integrationibi
 
Partner Ecosystem Showcase for Apache Ranger and Apache Atlas
Partner Ecosystem Showcase for Apache Ranger and Apache AtlasPartner Ecosystem Showcase for Apache Ranger and Apache Atlas
Partner Ecosystem Showcase for Apache Ranger and Apache AtlasDataWorks Summit
 
Presentation big dataappliance-overview_oow_v3
Presentation   big dataappliance-overview_oow_v3Presentation   big dataappliance-overview_oow_v3
Presentation big dataappliance-overview_oow_v3xKinAnx
 
Modern data warehouse
Modern data warehouseModern data warehouse
Modern data warehouseStephen Alex
 
Modern data warehouse
Modern data warehouseModern data warehouse
Modern data warehouseStephen Alex
 
SQL Server Konferenz 2014 - SSIS & HDInsight
SQL Server Konferenz 2014 - SSIS & HDInsightSQL Server Konferenz 2014 - SSIS & HDInsight
SQL Server Konferenz 2014 - SSIS & HDInsightTillmann Eitelberg
 
Big Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RKBig Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RKRajesh Jayarman
 
Lessons Learned Migrating from IBM BigInsights to Hortonworks Data Platform
Lessons Learned Migrating from IBM BigInsights to Hortonworks Data PlatformLessons Learned Migrating from IBM BigInsights to Hortonworks Data Platform
Lessons Learned Migrating from IBM BigInsights to Hortonworks Data PlatformDataWorks Summit
 
Accelerating Big Data Analytics
Accelerating Big Data AnalyticsAccelerating Big Data Analytics
Accelerating Big Data AnalyticsAttunity
 
CCD-410 Cloudera Study Material
CCD-410 Cloudera Study MaterialCCD-410 Cloudera Study Material
CCD-410 Cloudera Study MaterialRoxycodone Online
 
Transform You Business with Big Data and Hortonworks
Transform You Business with Big Data and HortonworksTransform You Business with Big Data and Hortonworks
Transform You Business with Big Data and HortonworksHortonworks
 
Hadoop and SQL: Delivery Analytics Across the Organization
Hadoop and SQL:  Delivery Analytics Across the OrganizationHadoop and SQL:  Delivery Analytics Across the Organization
Hadoop and SQL: Delivery Analytics Across the OrganizationSeeling Cheung
 
Transform Your Business with Big Data and Hortonworks
Transform Your Business with Big Data and Hortonworks Transform Your Business with Big Data and Hortonworks
Transform Your Business with Big Data and Hortonworks Pactera_US
 
Eric Baldeschwieler Keynote from Storage Developers Conference
Eric Baldeschwieler Keynote from Storage Developers ConferenceEric Baldeschwieler Keynote from Storage Developers Conference
Eric Baldeschwieler Keynote from Storage Developers ConferenceHortonworks
 

Similar to Scalable ETL with Talend and Hadoop, Cédric Carbone, Talend. (20)

VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
 
Robin_Hadoop
Robin_HadoopRobin_Hadoop
Robin_Hadoop
 
Vmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps IronfanVmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps Ironfan
 
Architecting the Future of Big Data and Search
Architecting the Future of Big Data and SearchArchitecting the Future of Big Data and Search
Architecting the Future of Big Data and Search
 
2015 02 12 talend hortonworks webinar challenges to hadoop adoption
2015 02 12 talend hortonworks webinar challenges to hadoop adoption2015 02 12 talend hortonworks webinar challenges to hadoop adoption
2015 02 12 talend hortonworks webinar challenges to hadoop adoption
 
Summer Shorts: Big Data Integration
Summer Shorts: Big Data IntegrationSummer Shorts: Big Data Integration
Summer Shorts: Big Data Integration
 
Partner Ecosystem Showcase for Apache Ranger and Apache Atlas
Partner Ecosystem Showcase for Apache Ranger and Apache AtlasPartner Ecosystem Showcase for Apache Ranger and Apache Atlas
Partner Ecosystem Showcase for Apache Ranger and Apache Atlas
 
Presentation big dataappliance-overview_oow_v3
Presentation   big dataappliance-overview_oow_v3Presentation   big dataappliance-overview_oow_v3
Presentation big dataappliance-overview_oow_v3
 
Modern data warehouse
Modern data warehouseModern data warehouse
Modern data warehouse
 
Modern data warehouse
Modern data warehouseModern data warehouse
Modern data warehouse
 
SQL Server Konferenz 2014 - SSIS & HDInsight
SQL Server Konferenz 2014 - SSIS & HDInsightSQL Server Konferenz 2014 - SSIS & HDInsight
SQL Server Konferenz 2014 - SSIS & HDInsight
 
Big Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RKBig Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RK
 
Lessons Learned Migrating from IBM BigInsights to Hortonworks Data Platform
Lessons Learned Migrating from IBM BigInsights to Hortonworks Data PlatformLessons Learned Migrating from IBM BigInsights to Hortonworks Data Platform
Lessons Learned Migrating from IBM BigInsights to Hortonworks Data Platform
 
Accelerating Big Data Analytics
Accelerating Big Data AnalyticsAccelerating Big Data Analytics
Accelerating Big Data Analytics
 
CCD-410 Cloudera Study Material
CCD-410 Cloudera Study MaterialCCD-410 Cloudera Study Material
CCD-410 Cloudera Study Material
 
Transform You Business with Big Data and Hortonworks
Transform You Business with Big Data and HortonworksTransform You Business with Big Data and Hortonworks
Transform You Business with Big Data and Hortonworks
 
Hadoop and SQL: Delivery Analytics Across the Organization
Hadoop and SQL:  Delivery Analytics Across the OrganizationHadoop and SQL:  Delivery Analytics Across the Organization
Hadoop and SQL: Delivery Analytics Across the Organization
 
Transform Your Business with Big Data and Hortonworks
Transform Your Business with Big Data and Hortonworks Transform Your Business with Big Data and Hortonworks
Transform Your Business with Big Data and Hortonworks
 
Prashanth Kumar_Hadoop_NEW
Prashanth Kumar_Hadoop_NEWPrashanth Kumar_Hadoop_NEW
Prashanth Kumar_Hadoop_NEW
 
Eric Baldeschwieler Keynote from Storage Developers Conference
Eric Baldeschwieler Keynote from Storage Developers ConferenceEric Baldeschwieler Keynote from Storage Developers Conference
Eric Baldeschwieler Keynote from Storage Developers Conference
 

More from OW2

OW2 and RIOS teaming up to boost the open source impact, Nov. 2022 in Roma
OW2 and RIOS teaming up to boost the open source impact, Nov. 2022 in RomaOW2 and RIOS teaming up to boost the open source impact, Nov. 2022 in Roma
OW2 and RIOS teaming up to boost the open source impact, Nov. 2022 in RomaOW2
 
The Open Source Good Governance Initiative presented at RIOS OS Week, Nov. 20...
The Open Source Good Governance Initiative presented at RIOS OS Week, Nov. 20...The Open Source Good Governance Initiative presented at RIOS OS Week, Nov. 20...
The Open Source Good Governance Initiative presented at RIOS OS Week, Nov. 20...OW2
 
GLPi v.10, les fonctionnalités principales et l'offre cloud
GLPi v.10, les fonctionnalités principales et l'offre cloudGLPi v.10, les fonctionnalités principales et l'offre cloud
GLPi v.10, les fonctionnalités principales et l'offre cloudOW2
 
Centreon: superviser le Cloud et le Legacy à partir d'une même plateforme, po...
Centreon: superviser le Cloud et le Legacy à partir d'une même plateforme, po...Centreon: superviser le Cloud et le Legacy à partir d'une même plateforme, po...
Centreon: superviser le Cloud et le Legacy à partir d'une même plateforme, po...OW2
 
FusionIAM : la gestion des identités et des accés open source
FusionIAM : la gestion des identités et des accés open sourceFusionIAM : la gestion des identités et des accés open source
FusionIAM : la gestion des identités et des accés open sourceOW2
 
OW2 Association Européenne aux racines grenobloises, transformer l'industrie ...
OW2 Association Européenne aux racines grenobloises, transformer l'industrie ...OW2 Association Européenne aux racines grenobloises, transformer l'industrie ...
OW2 Association Européenne aux racines grenobloises, transformer l'industrie ...OW2
 
SFScon'20 Bringing the User into the Equation
SFScon'20 Bringing the User into the EquationSFScon'20 Bringing the User into the Equation
SFScon'20 Bringing the User into the EquationOW2
 
Towards a sustainable solution to open source sustainability, OW2online20, Ju...
Towards a sustainable solution to open source sustainability, OW2online20, Ju...Towards a sustainable solution to open source sustainability, OW2online20, Ju...
Towards a sustainable solution to open source sustainability, OW2online20, Ju...OW2
 
Advanced proactive and polymorphing cloud application adaptation with MORPHEM...
Advanced proactive and polymorphing cloud application adaptation with MORPHEM...Advanced proactive and polymorphing cloud application adaptation with MORPHEM...
Advanced proactive and polymorphing cloud application adaptation with MORPHEM...OW2
 
Open Source governance and the Eclipse Foundation, OW2online, June 2020
Open Source governance and the Eclipse Foundation, OW2online, June 2020Open Source governance and the Eclipse Foundation, OW2online, June 2020
Open Source governance and the Eclipse Foundation, OW2online, June 2020OW2
 
Open source contribution policies, OW2online, June 2020
Open source contribution policies, OW2online, June 2020Open source contribution policies, OW2online, June 2020
Open source contribution policies, OW2online, June 2020OW2
 
Software development at scale, pandemic lockdown and oss ecosystems, OW2onlin...
Software development at scale, pandemic lockdown and oss ecosystems, OW2onlin...Software development at scale, pandemic lockdown and oss ecosystems, OW2onlin...
Software development at scale, pandemic lockdown and oss ecosystems, OW2onlin...OW2
 
Overview of the OpenChain Reference Tooling Work Group, OW2online20, June 2020
Overview of the OpenChain Reference Tooling Work Group, OW2online20, June 2020Overview of the OpenChain Reference Tooling Work Group, OW2online20, June 2020
Overview of the OpenChain Reference Tooling Work Group, OW2online20, June 2020OW2
 
Open Source Compliance at Orange, OW2online, June 2020
Open Source Compliance at Orange, OW2online, June 2020Open Source Compliance at Orange, OW2online, June 2020
Open Source Compliance at Orange, OW2online, June 2020OW2
 
Ideas, methods and tools for OSS Compliance assessment, OW2online, June 2020
Ideas, methods and tools for OSS Compliance assessment, OW2online, June 2020Ideas, methods and tools for OSS Compliance assessment, OW2online, June 2020
Ideas, methods and tools for OSS Compliance assessment, OW2online, June 2020OW2
 
Intelligent package management with FASTEN, OW2online, June 2020
Intelligent package management with FASTEN, OW2online, June 2020Intelligent package management with FASTEN, OW2online, June 2020
Intelligent package management with FASTEN, OW2online, June 2020OW2
 
DECODER, a Smarter Environment for DevOps Teams , OW2online, June 2020
DECODER, a Smarter Environment for DevOps Teams , OW2online, June 2020DECODER, a Smarter Environment for DevOps Teams , OW2online, June 2020
DECODER, a Smarter Environment for DevOps Teams , OW2online, June 2020OW2
 
Enabling DevOps for IoT software development, powered by Open Source, OW2onli...
Enabling DevOps for IoT software development, powered by Open Source, OW2onli...Enabling DevOps for IoT software development, powered by Open Source, OW2onli...
Enabling DevOps for IoT software development, powered by Open Source, OW2onli...OW2
 
Upcoming Challenges in Artificial Intelligence Research and Development, OW2o...
Upcoming Challenges in Artificial Intelligence Research and Development, OW2o...Upcoming Challenges in Artificial Intelligence Research and Development, OW2o...
Upcoming Challenges in Artificial Intelligence Research and Development, OW2o...OW2
 
Cacti and Big Data at Orange France, OW2online, June 2020
Cacti and Big Data at Orange France, OW2online, June 2020Cacti and Big Data at Orange France, OW2online, June 2020
Cacti and Big Data at Orange France, OW2online, June 2020OW2
 

More from OW2 (20)

OW2 and RIOS teaming up to boost the open source impact, Nov. 2022 in Roma
OW2 and RIOS teaming up to boost the open source impact, Nov. 2022 in RomaOW2 and RIOS teaming up to boost the open source impact, Nov. 2022 in Roma
OW2 and RIOS teaming up to boost the open source impact, Nov. 2022 in Roma
 
The Open Source Good Governance Initiative presented at RIOS OS Week, Nov. 20...
The Open Source Good Governance Initiative presented at RIOS OS Week, Nov. 20...The Open Source Good Governance Initiative presented at RIOS OS Week, Nov. 20...
The Open Source Good Governance Initiative presented at RIOS OS Week, Nov. 20...
 
GLPi v.10, les fonctionnalités principales et l'offre cloud
GLPi v.10, les fonctionnalités principales et l'offre cloudGLPi v.10, les fonctionnalités principales et l'offre cloud
GLPi v.10, les fonctionnalités principales et l'offre cloud
 
Centreon: superviser le Cloud et le Legacy à partir d'une même plateforme, po...
Centreon: superviser le Cloud et le Legacy à partir d'une même plateforme, po...Centreon: superviser le Cloud et le Legacy à partir d'une même plateforme, po...
Centreon: superviser le Cloud et le Legacy à partir d'une même plateforme, po...
 
FusionIAM : la gestion des identités et des accés open source
FusionIAM : la gestion des identités et des accés open sourceFusionIAM : la gestion des identités et des accés open source
FusionIAM : la gestion des identités et des accés open source
 
OW2 Association Européenne aux racines grenobloises, transformer l'industrie ...
OW2 Association Européenne aux racines grenobloises, transformer l'industrie ...OW2 Association Européenne aux racines grenobloises, transformer l'industrie ...
OW2 Association Européenne aux racines grenobloises, transformer l'industrie ...
 
SFScon'20 Bringing the User into the Equation
SFScon'20 Bringing the User into the EquationSFScon'20 Bringing the User into the Equation
SFScon'20 Bringing the User into the Equation
 
Towards a sustainable solution to open source sustainability, OW2online20, Ju...
Towards a sustainable solution to open source sustainability, OW2online20, Ju...Towards a sustainable solution to open source sustainability, OW2online20, Ju...
Towards a sustainable solution to open source sustainability, OW2online20, Ju...
 
Advanced proactive and polymorphing cloud application adaptation with MORPHEM...
Advanced proactive and polymorphing cloud application adaptation with MORPHEM...Advanced proactive and polymorphing cloud application adaptation with MORPHEM...
Advanced proactive and polymorphing cloud application adaptation with MORPHEM...
 
Open Source governance and the Eclipse Foundation, OW2online, June 2020
Open Source governance and the Eclipse Foundation, OW2online, June 2020Open Source governance and the Eclipse Foundation, OW2online, June 2020
Open Source governance and the Eclipse Foundation, OW2online, June 2020
 
Open source contribution policies, OW2online, June 2020
Open source contribution policies, OW2online, June 2020Open source contribution policies, OW2online, June 2020
Open source contribution policies, OW2online, June 2020
 
Software development at scale, pandemic lockdown and oss ecosystems, OW2onlin...
Software development at scale, pandemic lockdown and oss ecosystems, OW2onlin...Software development at scale, pandemic lockdown and oss ecosystems, OW2onlin...
Software development at scale, pandemic lockdown and oss ecosystems, OW2onlin...
 
Overview of the OpenChain Reference Tooling Work Group, OW2online20, June 2020
Overview of the OpenChain Reference Tooling Work Group, OW2online20, June 2020Overview of the OpenChain Reference Tooling Work Group, OW2online20, June 2020
Overview of the OpenChain Reference Tooling Work Group, OW2online20, June 2020
 
Open Source Compliance at Orange, OW2online, June 2020
Open Source Compliance at Orange, OW2online, June 2020Open Source Compliance at Orange, OW2online, June 2020
Open Source Compliance at Orange, OW2online, June 2020
 
Ideas, methods and tools for OSS Compliance assessment, OW2online, June 2020
Ideas, methods and tools for OSS Compliance assessment, OW2online, June 2020Ideas, methods and tools for OSS Compliance assessment, OW2online, June 2020
Ideas, methods and tools for OSS Compliance assessment, OW2online, June 2020
 
Intelligent package management with FASTEN, OW2online, June 2020
Intelligent package management with FASTEN, OW2online, June 2020Intelligent package management with FASTEN, OW2online, June 2020
Intelligent package management with FASTEN, OW2online, June 2020
 
DECODER, a Smarter Environment for DevOps Teams , OW2online, June 2020
DECODER, a Smarter Environment for DevOps Teams , OW2online, June 2020DECODER, a Smarter Environment for DevOps Teams , OW2online, June 2020
DECODER, a Smarter Environment for DevOps Teams , OW2online, June 2020
 
Enabling DevOps for IoT software development, powered by Open Source, OW2onli...
Enabling DevOps for IoT software development, powered by Open Source, OW2onli...Enabling DevOps for IoT software development, powered by Open Source, OW2onli...
Enabling DevOps for IoT software development, powered by Open Source, OW2onli...
 
Upcoming Challenges in Artificial Intelligence Research and Development, OW2o...
Upcoming Challenges in Artificial Intelligence Research and Development, OW2o...Upcoming Challenges in Artificial Intelligence Research and Development, OW2o...
Upcoming Challenges in Artificial Intelligence Research and Development, OW2o...
 
Cacti and Big Data at Orange France, OW2online, June 2020
Cacti and Big Data at Orange France, OW2online, June 2020Cacti and Big Data at Orange France, OW2online, June 2020
Cacti and Big Data at Orange France, OW2online, June 2020
 

Recently uploaded

"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 

Recently uploaded (20)

"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 

Scalable ETL with Talend and Hadoop, Cédric Carbone, Talend.

  • 1. Scalable ETL with Talend and Hadoop Talend,  Global  Leader  in  Open  Source  Integra7on  Solu7ons   Cédric Carbone – Talend CTO Twitter : @carbone ccarbone@talend.com
  • 2. Why speaking about ETL with Hadoop Hadoop  is  complex   BI  consultants  don’t  have  the  skill  to  manipulate  Hadoop   ➜  Biggest  issue  in  Hadoop  project  is  to  find  skilled  Hadoop   Engineers   ➜  ETL  tool  like  Talend  can  help  to  democra7ze  Hadoop  
  • 3. Trying  to  get  from  this…  
  • 4. to  this…   Why Talend… Talend generates code that is executed within map reduce. This open approach removes the limitation of a proprietary “engine” to provide a truly unique and powerful set of tools for big data.
  • 5. BIG Data Management Big  Data     Produc7on   RDBMS   Analy7cal  DB   NoSQL  DB   ERP/CRM   SaaS   Social  Media   Web  Analy7cs   Log  Files   RFID   Call  Data  Records   Sensors   Machine-­‐Generated   Big  Data  Management   Big  Data     Integra7on   Big Data Quality Big  Data     Consump7on   Mining   Analy7cs   Storage Processing Filtering Parsing Checking Search   Enrichment   Turn Big Data into actionable information
  • 6. Two methods for inserting data quality into a big data job 1.  Pipelining:  as  part  of  the  load  process   2.  Load  the  cluster  than  implement  and  execute  a  data   quality  map  reduce  job  
  • 7.             Extract  –  Transform  -­‐  Load   E-­‐T-­‐L
  • 8.                 Extract  –  Improve/Cleanse  -­‐  Load   E-­‐              -­‐L DQ
  • 9. Pipelining: data quality with big data CRM ERP Finance   DQ   DQ DQ   Big Data DQ Social Networking Mobile Devices DQ •  Use  tradi7onal  data  quality  tools   •  Once  and  done  
  • 10. Big data alternative: Load and improve within the cluster   CRM DQ ERP   DQ   Finance Social Networking Mobile Devices Big Data •  Load  first,  improve  later   •  Complex  matching  cannot  be  done  outside  
  • 11. One key DQ rules: Match Find  duplicates  within  Hadoop   ➜  Today’s  matching  algorithms  are  processor-­‐intensive   ➜  Tomorrow’s  matching  algorithms  could  be  more   precise,  more  intensive   ➜ 
  • 13. What’s hadoop ➜  ➜  ➜  ➜  The  Apache™  Hadoop®  project  develops  open-­‐source   so^ware  for  reliable,  scalable,  distributed  compu7ng.   Java  framework  for  storage  and  running  data   transforma7on  on  large  cluster  of  commodity   hardware   Licensed  under  the  Apache  v2  license   Created  from  Google's  MapReduce,  BigTable  and   Google  File  System  (GFS)  papers  
  • 14. Hadoop ecosystem Sqoop   ➜  Pig   ➜  Hive   ➜  Oozie   ➜  HCatalog   ➜  Mahout   ➜  ➜  HBase  
  • 15. Talend for Big Data : Hadoop Story 4.0:  [April  2010]   Put  or  get  data  into   Haddop  through   HDFS  connectors   4.1:    [Oct  2010]   Hadoop  Query  (Hive)     Bulk  loald  &  fast  export  to   Hadoop  (Sqoop)   5.1:  [May  2012]   Metadata  (Hcatalog)   Deployement  &  Scheduling  (Oozie)   Embeded  into  HDP   HCatalog     4.2:  [May  2011]   Transforma7on   (Pig)   5.2:  [Oct  2012]   Visual  ELT  mapping  (Hive)   DataLineage  &  Impact  Analysis     5.0:  [Nov  2011]   Hbase  NoSQL.   Extend  our  tPig*   5.3  -­‐  [June  2013]   Visual  Pig  mapping   Machine  Learning  (Mahout)   Na7ve  MapReduce  Code  Gen    
  • 16. Democratizing Integration with Data Integration tools for Big Data  
  • 17. WordCount ➜  WordCount   •  Comes  with  Hadoop   •  “First”  demo  that  everyone  tries!   ➜  How-­‐to  in  Talend  Big  Data   •  Simple  read,  count,  load  results   •  No  coding,  just  drag-­‐n-­‐drop   •  Runs  remotely  
  • 18. Map Reduce Data  Node  1   Data  Node  2  
  • 24. WordCount: Howto with a Graphical ETL
  • 25. Thank You!      
  • 26. Let  us  show  you…  
  • 29. Pig Latin Script generation FOREACH  tPigMap_1_out1_RESULT  GENERATE  $4  AS   Revenu  ,  $6  AS  Label  
  • 32. Apache Mahout Big  Data  can  also  be  a  blob  of   data  to  an  organiza7on   ➜  Apache  Mahout  provides   algorithms  to  understanding   data  –  data  mining   ➜  “You  don’t  know  what  you   don’t  know.”  and  mahout  will   tell  you.   ➜ 
  • 33. Metadata Management ➜  Centralize  Metadata  repository  for  Hadoop  Cluster,   HDFS,  Hive…   •  Versioning   •  Impact  Analysis  and  Data  Lineage     ➜  HCatalog  accros     •  HDFS   •  Hive   •  Pig  
  • 34. Thank You!      
  • 35. Choose your Hadoop distro Widely  adopted   ➜  Management  tooling  is  not  OSS   ➜  Fully  OpenSource   ➜  Strong  Developer  ecosystem   ➜  More  proprietary   ➜  GTM  partner  with  AWS   ➜  ➜  A  lot  of  more  are  comming  
  • 36. Choose your Hadoop distro Provide  tooling  :   ➜  For  installa7on   ➜  For  server  monitoring     But   ➜  No  GUI  for  parsing,   transforming,  easily   loading.  No  data   management    
  • 37. Parse and Standardize Big  Data  is  not  always  structured   ➜  Correct  big  data  so  that  data  conforms  to  the  same   rules   ➜ 
  • 39. Implications for Integration DATA Challenge:  Informa7on  explosion  increases   complexity  of  integra7on  and  requires   governance  to  maintain  data  quality     Requirement:  Informa7on  processing     must  scale  
  • 40. Implications for Integration APPLICATION     Challenge:  Brisle,  point-­‐to-­‐point   connec7ons  cannot  adapt  to  evolving   business  requirements,  new  channels,   and  quickly  changing  topologies     Requirement:  Applica7on  architecture   must  scale    
  • 41. Implications for Integration PROCESS   Challenge:  Compe77ve  market  forces  drive   frequent  process  changes  and  increased   process  complexity     Requirement: Business  processes   must  scale  
  • 42. Implications for Integration Challenge:  Interdependencies   across  data,  applica7ons  and   processes  require  more   resources  and  budget     Requirement:  Resources  and   skillsets  must  scale      
  • 43. Integration at Any Scale True scalability for •  Any  integra7on  challenge   •  Any  data  volume   •  Any  project  size   Enables integration convergence  
  • 44. Technology that Scales Java   STANDARDS-BASED Easy  to  learn,  flexible  to  adopt,   reduces  vendor  lock-­‐in   SQL   Camel   Map   Reduce   ……   CODE GENERATOR No  black-­‐box  engine  means  faster   maintenance  and  deployment   with  improved  quality     The  “engine”  for  Big  Data  is   “Hadoop”,  making  it  uniquely  run   at  infinite  scale.  
  • 45. Technology Continuum… Google Big Query     ELT (SQL CodeGen) •  Terradata     •  Netezza •  Vertica         •  Visual Wizard •  Hive/Pig/MR CodeGen •  HDFS, Sqoop, Oozie…     ETL     •  Java Code •  Partionning •  Paralelisation NoSQL •  MongoDB •  Neo4J •  Cassandra •  Hbase •  Amazon Redshift
  • 46. Talend Overview At a glance Talend  today   ➜  400  employees  in  7  countries  with  dual  HQ  in  Los  Altos,  CA   and  Paris,  France   ➜  Over  4,000  paying  customers  across  different  industry   ver7cals  and  company  sizes   Backed  by  Silver  Lake  Sumeru,  Balderton  Capital  and   Idinvest  Partners   ▶  Founded  in  2005   ▶  Offers  highly  scalable   integra7on  solu7ons   addressing  Data  Integra7on,   Data  Quality,  MDM,  ESB  and   BPM   ▶  Provides:     ➜  High  growth  through  a  proven  model   §  Subscrip7ons  including   24/7  support  and   indemnifica7on;     Brand     Awareness   20  million   Downloads   §  Worldwide  training  and   services   ▶  Recognized  as  the  open  source   leader  in  each  of  its  market   categories   Market   Momentum   AdopDon   1,000,000   Users   +50  New   Customers  /   Month   MoneDzaDon   4,000   Customers  
  • 47. Talend’s Unique Integration Solution Best-ofBreed Solutions Data Quality Data Integration MDM ESB BPM + Studio   Repository   Deployment   Execu7on   Monitoring   Talend Unified Single  web-­‐based   Web-­‐based   Comprehensive   Platform monitoring  console   Eclipse-­‐based     deployment  &   ➜  Reduce  costs   Talend scheduling   user  interface   ➜  Eliminate  risk   5 Unified = 1 3 Same  container  for   Consolidated   ➜  Reuse  skills   Platform metadata  &  project   batch  processing,   Unique ➜  Economies  of  scale   informa7on   message  rou7ng  &   Integration services   ➜  Incremental  adop7on   2 Solution 4 Recognized  as  the  open  source  leader  in  each  of  its  market  category  by   all  industry  analysts  
  • 48. Solutions that Scale Big  Data         Studio   Data    Quality         Repository   Data    Integra7on         Data     Integra7on     MDM   ESB   BPM           TALEND UNIFIED  PLATFORM           Deployment     Execu7on   Monitoring   UNIFIED PLATFORM A  shared  founda7on  and  toolset  increases  resource  reuse   CONVERGED INTEGRATION Use  for  any  data,  applica7on  and     process  project  
  • 49. The 6 Dimensions of BIG Data Primary  challenges   ¾ Volume   ¾ Velocity   ¾ Variety     And  also   ¾ Complexity   ¾ Valida7on   ¾ Lineage  
  • 50. What Is BIG Data? "Big  data"     is  informa7on     of  extreme  size,   diversity,  complexity   and  need  for  rapid   processing.   Ted Friedman - Information Infrastructure and Big Data Projects Key Initiative Overview - July 2011 3,500  tweets     per  second   (June  2011)   1,000,000  transac7ons   per  day  at  Walmart   200  billion   intelligent  devices   200,000,000,000   2015 275  exabytes   of  data  flowing  over     the  Internet  each  day   275,000,000,000,000,000,000   2020
  • 51. What is Big Data?  
  • 52. How to define Big data is…. Hans  Rosling  –  uses  big  data  to  analyze  world  health  trends   Key  Takeaway  #1   volume, variety, velocity
  • 53. Traditional Data Flows CRM   ERP   ETL   Data  Quality   Normalized   Data   Tradi7onal  Data   Warehouse   Finance   •  Scheduled–daily  or  weekly,   some7mes  more  frequently.     •  Volumes  rarely  exceed   terabytes   Business   Analyst   Warehouse   Administrator     Business   User   Execu7ves  
  • 54. The new world of big data CRM   Social  Networking       ERP   Finance   Big Data
  • 55. The new world of big data Social  Networking     CRM     ERP   Finance     Big Data Mobile  Devices  
  • 56. The new world of big data Social  Networking     CRM     ERP   Mobile  Devices     Transac7ons     Finance   Network  Devices     Big Data Sensors  
  • 57. Key  Takeaway  #2   Forces us to think differently
  • 58. Data driven business enables data governance   information supports Information provides value to the business If  you  can't  rely  on  your  informa7on  then  the   result  can  be  missed  opportuni7es,  or  higher   costs.   Mashew  West  and  Julian  Fowler  (1999).  Developing  High  Quality  Data  Models.     The  European  Process  Industries  STEP  Technical  Liaison  Execu7ve  (EPISTLE).   decisions drives Your business
  • 59. BIG data driven business BIG  data   enables governance   BIG   supports information Information provides value to the business If  you  can't  rely  on  your  informa7on  then  the   result  can  be  missed  opportuni7es,  or  higher   costs.   Mashew  West  and  Julian  Fowler  (1999).  Developing  High  Quality  Data  Models.     The  European  Process  Industries  STEP  Technical  Liaison  Execu7ve  (EPISTLE).   BIG   decisions drives BIG   business
  • 60. How is big data integration being used? Use  Cases   •  •  •  •  •  •  •  •  •  •      Recommenda7on  Engine   Sen7ment  Analysis   Risk  Modeling   Fraud  Detec7on   Behavior  Analysis   Marke7ng  Campaign  Analysis   Customer  Churn  Analysis   Social  Graph  Analysis   Customer  Experience  Analy7cs   Network  Monitoring           BUT:  to  what  level  is  DQ  required  for  your  use  case?  
  • 61. Key  Takeaway  #3   Define your use case