SlideShare a Scribd company logo
1 of 25
The Big Data Ecosystem
Talend & Caserta Concepts Webinar


Ciaran Dynes
Director, Product Management & Product Marketing, Talend


Joe Caserta
Founder & President, Caserta Concepts
Integration at Any Scale
Talend is the only integration vendor that enables
your business to scale through:

 An open source-based solution supported by
 a vast community and enterprise-class services


 An innovative, unified platform that scales data,
 application and business processes of any complexity


  A usage-based subscription model delivering
                                                        $
  a fast return on investment
Talend - Integration at Any Scale

Talend offers true
scalability for
• Any integration challenge
• Any data volume
• Any project size




Talend enables
integration
convergence
Working with Leading Vendors

Platforms/Hadoop        Appliance              NoSQL




                    Data Management            Analytics


                     System Integrators



System Integrators play a vital role in providing expertise
The Big Data Ecosystem
Talend & Caserta Concepts Webinar


Joe Caserta
Founder & President, Caserta Concepts


Ciaran Dynes
Director, Product Management & Product Marketing, Talend
Joe Caserta Timeline
                                        2012
   Partnered with Big Data vendors             Laser focus on Big Data solutions for
  Cloudera, HortonWorks, Datameer,             Financial Sector & eCommerce
               more…                    2010
                                               Formalized Talend Alliance
                                        2009   Partnership – System Integrators

     Launched Big Data practice
                                        2004
                                               Co-author, with Ralph Kimball, The
 Launched Training practice, teaching          Data Warehouse ETL Toolkit (Wiley)
 data concepts world-wide
                                        2001
                                               Web log analytics solution published
 Founded Caserta Concepts in NYC
                                               in Intelligent Enterprise
                                        1996

 Began consulting career as                    Dedicated to Data Warehousing,
 programmer/data modeler                       Business Intelligence since 1996
                                        1986
                                               25+ years hands-on experience
                                               building database solutions
Caserta Concepts
• Technology services company with expertise in data analysis:
  • Data Management
  • Big Data & Analytics

• With core focus in the following industries:
  • Financial Services
  • Insurance / Healthcare
  • eCommerce / Higher Education

• Established in 2001:
  • Increased growth year-over-year
  • Industry recognized work force
  • Consulting, Writing, Education
Expertise & Offerings
 Strategic Roadmap/
 Assessment/Consulting


 Big Data
 Analytics




 Data Warehousing/
 ETL/Data Integration


 BI/Visualization/
 Analytics



 Master Data Management
Client Portfolio
Finance
& Insurance




Retail/eCommerce
& Manufacturing




Education
& Services
The Good Old Days: Traditional Data Warehousing
                                              Metadata


                                                                             Standard Reports
     Web Logs




                                                                            Ad-hoc Query Tools
       External      Extract
     Data Sources               Optimized

                                   Load
                    Transform                                                    Data Mining
                                                Data
                                              Warehouse
      Relational
     Systems/ERP
                                                                                 MDD/OLAP



                               Closed-loop
        Legacy                  feedback                                  Analytical Applications
       Systems                 applications
                                                               Data Marts
                                                         (The data warehouse?)
What is “Big Data”?
• A collection of data sets so large and complex that
 it becomes difficult to process using on-hand
 database management tools or traditional data
 processing applications.

• Challenges include capture, storage, search,
 sharing, transfer, analysis, and visualization.

• Relational databases were designed for
 applications, we use only a small fraction of their
 capabilities in analytics applications.

• Enforcing a relational structure upon our data is
 not always what we want.
What’s the Difference?
       Traditional Data                         Big Data
Very accurate transactional data.   Lots of data with value that can
Analyzed by humans                  only be attained by deep analytics

Measured in terabytes               Measured in petabytes
Structured data                     Structured/Unstructured data
Input by human “system users”       Created by everybody, plus all of
                                    our machine friends
Oracle, SAP, etc.                   Open source, Hadoop
HW/SW investment measured in        HW/SW investment measured in
$10M                                $10K
Recording facts                     Harvesting insights
Try to keep up: This slide is already obsolete
So where does the data warehouse come in?
 • Will Big Data replace the data warehouse?
   • Yes – however there is much evolution ahead: real time
     integrations, interactive queries

 • Data Warehousing principles still apply to Big Data
   • Data Quality
   • Master Data
   • Data architecture


 • How do we leverage our existing investment?
Enterprise Technical Ecosystem
                                                                Traditional BI
   ERP
            ETL        Traditional
                         EDW
  Finance
                                                                 Ad-Hoc/Canned
                            ETL                                    Reporting
  Legacy



                  Big Data Cluster                                                Big Data BI
                                                     NoSQL
                                                     Database    Cassandra



                                                                                   Search/Data
                                                                                    Analytics
                       Mahout              MapReduce             Pig/Hive


                      N1             N2         N3         N4         N5
                                  Hadoop Distributed File System (HDFS)
                   Horizontally Scalable Environment - Optimized for Analytics   Canned Reporting
Extending EDW with Hadoop

•Eliminate barrier of imposing relational structure on data.


•Storage is fast, durable and cheap: Don’t throw away data that
can be valuable in the future

•Processing power
  • Hadoop scales linearly, don’t worry about the data set getting
    too big

•Machine learning


•Ad-Hoc reporting by non-technical users requires traditional
methods or additional application
Design Pattern #1: Hadoop Staging/Warehouse
feed relational EDW (Composite Warehouse)
 •      Hadoop serves as the staging ground for all data
         - Eliminate barrier of imposing relational structure on data.
         - Storage is fast, durable and cheap: Don’t throw away data that can be
           valuable in the future

 • Data scientists will work in the Hadoop environment to analyze, and mine structured
     and unstructured data using Pig, Hive, and Mahout (machine learning)

 • Data required for interactive reporting and traditional ad-hoc analysis is sent to
     downstream relational EDW
     Source Systems




                      Mahout         MapReduce             Pig/Hive

                                                                             Traditional DW
                      N1       N2         N3         N4         N5
                            Hadoop Distributed File System (HDFS)
Design Pattern #2: NoSQL Enhanced EDW
 •Not all structured data lends itself to being stored relationally:
    • Relationships: Graph Databases
    • Sparse Data: Columnar Databases

 •Very Large Datasets:
    • NoSQL databases are capable of scaling far beyond relational databases while
      maintaining performance
    • Ultra-performance key value stores and columnar databases can be very useful in
      storing certain types of high volume data for analytic purposes
    • Just don’t expect the ad-hoc flexibility of a relational database!



                                                                                    - Web analytics
      Mahout          MapReduce             Pig/Hive                   Cassandra    - Ad Impressions
                                                                       (columnar)

      N1        N2         N3          N4        N5
             Hadoop Distributed File System (HDFS)                                  - Networks
                                                                          Titan
                                                                                    - Recommender
                                                                         (graph)
                                                                                    - Path optimization



                      Traditional DW
Design Pattern #3: Add analytics to your NoSQL
cluster
  • If your application is already based on a NoSQL technology, consider
    building analytic site.
  • The analytic site is constantly streamed fresh transactions leveraging
    Cassandra's native replication
  • Aggregates and analytic views are materialized with Pig/Hive map/reduce,
    since the work is done on the cluster no load is placed on the applications.
    This analytic data is in turn replicated throughout the cluster


     Site 1
               Cassandra
                                                           Pig/Hive
                                            Cassandra
                                                          MapReduce
                                Analytics
                                Site
      Site 2                                                             Canned Reporting
               Cassandra

                                                                      Remember, NoSQL
                                                                      schemas are
                                            Traditional               “optimized to a
                                               DW                     query”, not ad-hoc
Emerging Tools

 Hive, although an excellent tool for data
 analysis is too slow for interactive
 queries. Recent projects have increased
 speed dramatically 10-100x.

 •   Google Dremel
 •   Apache/MapR Drill
 •   Hortonworks Stinger
 •   Cloudera Impala
Commonly Used Technologies
• Amazon Elastic MapReduce (EMR): Web service to access EC2/S3, pay-as-
you-go hosted Hadoop Infrastructure

• Hadoop Distribution: Cloudera; MapR; Hortonworks
• Apache Projects
    • Whirr: Used to launch/kill computing clusters
    • Kafka: Publish-subscribe messaging system
    • Mahout: Distributed machine learning
    • Hive: Map data to structures and use SQL-like queries
    • HBase: No-SQL/non-relational database, real-time read/write
    • Cassandra: Like HBase, no single point of failure
    • Chuckwa/Flume: Large-scale log collection
    • Pig: Procedural programming language, from Yahoo
    • Sqoop: “SQL-to-Hadoop”, like BCP for Hadoop
    • Zookeeper: Used to manage & adminster Hadoop
    • Solr: Full-text/Faceted Search
    • MongoDB: Document-oriented database
• Languages: Python, SciPy, Java
Leading Vendors (According to Joe)
   Hadoop                   NoSQL




                           Analytics



 Data Management
Parting Thought

 Polyglot Persistence – “where any decent sized
 enterprise will have a variety of different data storage
 technologies for different kinds of data. There will still
 be large amounts of it managed in relational stores,
 but increasingly we'll be first asking how we want to
 manipulate the data and only then figuring out what
 technology is the best bet for it.”
                                      -- Martin Fowler
Questions?
Please ask your questions now using the Q&A panel
Resources

➜    Recording will be made available on
       www.talend.com/resources/webinars

➜    Request a copy of the slides
       webinar@talend.com

➜    Contact Talend Sales
       • Email: sales@talend.com
       • Phone: 714.786.8140

➜    Contact Caserta Concepts
       • Joe Caserta, President
       • Email: joe@casertaconcepts.com
       • Phone: 855.755.2246 x227

© Talend 2012

More Related Content

What's hot

DOAG Big Data Days 2017 - Cloud Journey
DOAG Big Data Days 2017 - Cloud JourneyDOAG Big Data Days 2017 - Cloud Journey
DOAG Big Data Days 2017 - Cloud JourneyHarald Erb
 
Top Big data Analytics tools: Emerging trends and Best practices
Top Big data Analytics tools: Emerging trends and Best practicesTop Big data Analytics tools: Emerging trends and Best practices
Top Big data Analytics tools: Emerging trends and Best practicesSpringPeople
 
Big data analytics with Apache Hadoop
Big data analytics with Apache  HadoopBig data analytics with Apache  Hadoop
Big data analytics with Apache HadoopSuman Saurabh
 
02 a holistic approach to big data
02 a holistic approach to big data02 a holistic approach to big data
02 a holistic approach to big dataRaul Chong
 
Big data analytics, research report
Big data analytics, research reportBig data analytics, research report
Big data analytics, research reportJULIO GONZALEZ SANZ
 
Future of Data - Big Data
Future of Data - Big DataFuture of Data - Big Data
Future of Data - Big DataShankar R
 
DW Appliance
DW ApplianceDW Appliance
DW ApplianceShankar R
 
Hadoop - An Introduction
Hadoop - An IntroductionHadoop - An Introduction
Hadoop - An IntroductionShankar R
 
"From Big Data To Big Valuewith HPE Predictive Analytics & Machine Learning",...
"From Big Data To Big Valuewith HPE Predictive Analytics & Machine Learning",..."From Big Data To Big Valuewith HPE Predictive Analytics & Machine Learning",...
"From Big Data To Big Valuewith HPE Predictive Analytics & Machine Learning",...Dataconomy Media
 
IBM Big Data in the Cloud
IBM Big Data in the CloudIBM Big Data in the Cloud
IBM Big Data in the CloudRob Thomas
 
Big data – a brief overview
Big data – a brief overviewBig data – a brief overview
Big data – a brief overviewDorai Thodla
 
The Future Of Big Data
The Future Of Big DataThe Future Of Big Data
The Future Of Big DataMatthew Dennis
 
Great Expectations Presentation
Great Expectations PresentationGreat Expectations Presentation
Great Expectations PresentationAdam Doyle
 
Big Data Analysis Patterns - TriHUG 6/27/2013
Big Data Analysis Patterns - TriHUG 6/27/2013Big Data Analysis Patterns - TriHUG 6/27/2013
Big Data Analysis Patterns - TriHUG 6/27/2013boorad
 

What's hot (20)

Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
DOAG Big Data Days 2017 - Cloud Journey
DOAG Big Data Days 2017 - Cloud JourneyDOAG Big Data Days 2017 - Cloud Journey
DOAG Big Data Days 2017 - Cloud Journey
 
Top Big data Analytics tools: Emerging trends and Best practices
Top Big data Analytics tools: Emerging trends and Best practicesTop Big data Analytics tools: Emerging trends and Best practices
Top Big data Analytics tools: Emerging trends and Best practices
 
Exploring Big Data Analytics Tools
Exploring Big Data Analytics ToolsExploring Big Data Analytics Tools
Exploring Big Data Analytics Tools
 
Big data analytics with Apache Hadoop
Big data analytics with Apache  HadoopBig data analytics with Apache  Hadoop
Big data analytics with Apache Hadoop
 
02 a holistic approach to big data
02 a holistic approach to big data02 a holistic approach to big data
02 a holistic approach to big data
 
Big data analytics, research report
Big data analytics, research reportBig data analytics, research report
Big data analytics, research report
 
Future of Data - Big Data
Future of Data - Big DataFuture of Data - Big Data
Future of Data - Big Data
 
BigData
BigDataBigData
BigData
 
DW Appliance
DW ApplianceDW Appliance
DW Appliance
 
Hadoop - An Introduction
Hadoop - An IntroductionHadoop - An Introduction
Hadoop - An Introduction
 
"From Big Data To Big Valuewith HPE Predictive Analytics & Machine Learning",...
"From Big Data To Big Valuewith HPE Predictive Analytics & Machine Learning",..."From Big Data To Big Valuewith HPE Predictive Analytics & Machine Learning",...
"From Big Data To Big Valuewith HPE Predictive Analytics & Machine Learning",...
 
data warehouse vs data lake
data warehouse vs data lakedata warehouse vs data lake
data warehouse vs data lake
 
IBM Big Data in the Cloud
IBM Big Data in the CloudIBM Big Data in the Cloud
IBM Big Data in the Cloud
 
Big data – a brief overview
Big data – a brief overviewBig data – a brief overview
Big data – a brief overview
 
The Future Of Big Data
The Future Of Big DataThe Future Of Big Data
The Future Of Big Data
 
Bigdata
BigdataBigdata
Bigdata
 
Study: #Big Data in #Austria
Study: #Big Data in #AustriaStudy: #Big Data in #Austria
Study: #Big Data in #Austria
 
Great Expectations Presentation
Great Expectations PresentationGreat Expectations Presentation
Great Expectations Presentation
 
Big Data Analysis Patterns - TriHUG 6/27/2013
Big Data Analysis Patterns - TriHUG 6/27/2013Big Data Analysis Patterns - TriHUG 6/27/2013
Big Data Analysis Patterns - TriHUG 6/27/2013
 

Viewers also liked

Technical Radar (Chinese version) 2014-06
Technical Radar (Chinese version) 2014-06Technical Radar (Chinese version) 2014-06
Technical Radar (Chinese version) 2014-06Freyr Lin
 
Container microservices
Container microservicesContainer microservices
Container microservicesTsuyoshi Ushio
 
Modernizing the Legacy - How Dish is Adapting its SOA Services for a Cloud Fi...
Modernizing the Legacy - How Dish is Adapting its SOA Services for a Cloud Fi...Modernizing the Legacy - How Dish is Adapting its SOA Services for a Cloud Fi...
Modernizing the Legacy - How Dish is Adapting its SOA Services for a Cloud Fi...VMware Tanzu
 
Solr At Scale For Time-Oriented Data: Presented by Brett Hoerner, Rocana
Solr At Scale For Time-Oriented Data: Presented by Brett Hoerner, RocanaSolr At Scale For Time-Oriented Data: Presented by Brett Hoerner, Rocana
Solr At Scale For Time-Oriented Data: Presented by Brett Hoerner, RocanaLucidworks
 
20170918 remiqz - big data expo - final
20170918   remiqz - big data expo - final20170918   remiqz - big data expo - final
20170918 remiqz - big data expo - finalBigDataExpo
 
Build_Buy_StreamAnalytix_WhitePaper
Build_Buy_StreamAnalytix_WhitePaperBuild_Buy_StreamAnalytix_WhitePaper
Build_Buy_StreamAnalytix_WhitePaperJane Roberts
 
Architecting Security and Governance Across Multi Accounts
Architecting Security and Governance Across Multi AccountsArchitecting Security and Governance Across Multi Accounts
Architecting Security and Governance Across Multi AccountsAmazon Web Services
 
Next Generation Data Center Strategies
Next Generation Data Center StrategiesNext Generation Data Center Strategies
Next Generation Data Center StrategiesVenkat Nambiyur
 
Boston Devops Meetup June 22nd
Boston Devops Meetup June 22ndBoston Devops Meetup June 22nd
Boston Devops Meetup June 22ndmdilawari
 
Why choose VMware vCloud Suite Standard over vSOM
Why choose VMware vCloud Suite Standard over vSOMWhy choose VMware vCloud Suite Standard over vSOM
Why choose VMware vCloud Suite Standard over vSOMAnil Gupta (AJ) - vExpert
 
Agile Operations Keynote: Redefine the Role of IT Operations With Digital Tra...
Agile Operations Keynote: Redefine the Role of IT Operations With Digital Tra...Agile Operations Keynote: Redefine the Role of IT Operations With Digital Tra...
Agile Operations Keynote: Redefine the Role of IT Operations With Digital Tra...CA Technologies
 
A modern, flexible approach to Hadoop implementation incorporating innovation...
A modern, flexible approach to Hadoop implementation incorporating innovation...A modern, flexible approach to Hadoop implementation incorporating innovation...
A modern, flexible approach to Hadoop implementation incorporating innovation...DataWorks Summit
 
Plan de transport 2014: le Brabant Flamand
Plan de transport 2014: le Brabant FlamandPlan de transport 2014: le Brabant Flamand
Plan de transport 2014: le Brabant FlamandSNCB
 
BVBA SOSIS van Jeroen Meus kent rustige start
BVBA SOSIS van Jeroen Meus kent rustige startBVBA SOSIS van Jeroen Meus kent rustige start
BVBA SOSIS van Jeroen Meus kent rustige startThierry Debels
 
TOON Stephen Galsworthy
TOON Stephen GalsworthyTOON Stephen Galsworthy
TOON Stephen GalsworthyBigDataExpo
 
Partner Presentation vSphere6-VSAN-vCloud-vRealize
Partner Presentation vSphere6-VSAN-vCloud-vRealizePartner Presentation vSphere6-VSAN-vCloud-vRealize
Partner Presentation vSphere6-VSAN-vCloud-vRealizeErik Bussink
 
DFW meetup Cognitive services - parashar - feb 22
DFW meetup Cognitive services -  parashar - feb 22DFW meetup Cognitive services -  parashar - feb 22
DFW meetup Cognitive services - parashar - feb 22Parashar Shah
 
Freek bomhof tno
Freek bomhof tnoFreek bomhof tno
Freek bomhof tnoBigDataExpo
 

Viewers also liked (20)

Technical Radar (Chinese version) 2014-06
Technical Radar (Chinese version) 2014-06Technical Radar (Chinese version) 2014-06
Technical Radar (Chinese version) 2014-06
 
Container microservices
Container microservicesContainer microservices
Container microservices
 
Cloud developer evolution
Cloud developer evolutionCloud developer evolution
Cloud developer evolution
 
Modernizing the Legacy - How Dish is Adapting its SOA Services for a Cloud Fi...
Modernizing the Legacy - How Dish is Adapting its SOA Services for a Cloud Fi...Modernizing the Legacy - How Dish is Adapting its SOA Services for a Cloud Fi...
Modernizing the Legacy - How Dish is Adapting its SOA Services for a Cloud Fi...
 
Solr At Scale For Time-Oriented Data: Presented by Brett Hoerner, Rocana
Solr At Scale For Time-Oriented Data: Presented by Brett Hoerner, RocanaSolr At Scale For Time-Oriented Data: Presented by Brett Hoerner, Rocana
Solr At Scale For Time-Oriented Data: Presented by Brett Hoerner, Rocana
 
20170918 remiqz - big data expo - final
20170918   remiqz - big data expo - final20170918   remiqz - big data expo - final
20170918 remiqz - big data expo - final
 
Build_Buy_StreamAnalytix_WhitePaper
Build_Buy_StreamAnalytix_WhitePaperBuild_Buy_StreamAnalytix_WhitePaper
Build_Buy_StreamAnalytix_WhitePaper
 
Architecting Security and Governance Across Multi Accounts
Architecting Security and Governance Across Multi AccountsArchitecting Security and Governance Across Multi Accounts
Architecting Security and Governance Across Multi Accounts
 
Next Generation Data Center Strategies
Next Generation Data Center StrategiesNext Generation Data Center Strategies
Next Generation Data Center Strategies
 
Sudan tanıtımı
Sudan tanıtımıSudan tanıtımı
Sudan tanıtımı
 
Boston Devops Meetup June 22nd
Boston Devops Meetup June 22ndBoston Devops Meetup June 22nd
Boston Devops Meetup June 22nd
 
Why choose VMware vCloud Suite Standard over vSOM
Why choose VMware vCloud Suite Standard over vSOMWhy choose VMware vCloud Suite Standard over vSOM
Why choose VMware vCloud Suite Standard over vSOM
 
Agile Operations Keynote: Redefine the Role of IT Operations With Digital Tra...
Agile Operations Keynote: Redefine the Role of IT Operations With Digital Tra...Agile Operations Keynote: Redefine the Role of IT Operations With Digital Tra...
Agile Operations Keynote: Redefine the Role of IT Operations With Digital Tra...
 
A modern, flexible approach to Hadoop implementation incorporating innovation...
A modern, flexible approach to Hadoop implementation incorporating innovation...A modern, flexible approach to Hadoop implementation incorporating innovation...
A modern, flexible approach to Hadoop implementation incorporating innovation...
 
Plan de transport 2014: le Brabant Flamand
Plan de transport 2014: le Brabant FlamandPlan de transport 2014: le Brabant Flamand
Plan de transport 2014: le Brabant Flamand
 
BVBA SOSIS van Jeroen Meus kent rustige start
BVBA SOSIS van Jeroen Meus kent rustige startBVBA SOSIS van Jeroen Meus kent rustige start
BVBA SOSIS van Jeroen Meus kent rustige start
 
TOON Stephen Galsworthy
TOON Stephen GalsworthyTOON Stephen Galsworthy
TOON Stephen Galsworthy
 
Partner Presentation vSphere6-VSAN-vCloud-vRealize
Partner Presentation vSphere6-VSAN-vCloud-vRealizePartner Presentation vSphere6-VSAN-vCloud-vRealize
Partner Presentation vSphere6-VSAN-vCloud-vRealize
 
DFW meetup Cognitive services - parashar - feb 22
DFW meetup Cognitive services -  parashar - feb 22DFW meetup Cognitive services -  parashar - feb 22
DFW meetup Cognitive services - parashar - feb 22
 
Freek bomhof tno
Freek bomhof tnoFreek bomhof tno
Freek bomhof tno
 

Similar to Scaling Integration with Talend's Open Source Platform

Anexinet Big Data Solutions
Anexinet Big Data SolutionsAnexinet Big Data Solutions
Anexinet Big Data SolutionsMark Kromer
 
Incorporating the Data Lake into Your Analytic Architecture
Incorporating the Data Lake into Your Analytic ArchitectureIncorporating the Data Lake into Your Analytic Architecture
Incorporating the Data Lake into Your Analytic ArchitectureCaserta
 
Introduction To Big Data & Hadoop
Introduction To Big Data & HadoopIntroduction To Big Data & Hadoop
Introduction To Big Data & HadoopBlackvard
 
Hadoop as data refinery
Hadoop as data refineryHadoop as data refinery
Hadoop as data refinerySteve Loughran
 
Hadoop as Data Refinery - Steve Loughran
Hadoop as Data Refinery - Steve LoughranHadoop as Data Refinery - Steve Loughran
Hadoop as Data Refinery - Steve LoughranJAX London
 
Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which DataWorks Summit
 
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenariosThe Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarioskcmallu
 
BAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, SydneyBAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, SydneySai Paravastu
 
Derfor skal du bruge en DataLake
Derfor skal du bruge en DataLakeDerfor skal du bruge en DataLake
Derfor skal du bruge en DataLakeMicrosoft
 
Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02
Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02
Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02email2jl
 
Creating a Next-Generation Big Data Architecture
Creating a Next-Generation Big Data ArchitectureCreating a Next-Generation Big Data Architecture
Creating a Next-Generation Big Data ArchitecturePerficient, Inc.
 
Big Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RKBig Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RKRajesh Jayarman
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with HadoopPhilippe Julio
 
Apache hadoop bigdata-in-banking
Apache hadoop bigdata-in-bankingApache hadoop bigdata-in-banking
Apache hadoop bigdata-in-bankingm_hepburn
 
5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game Changer5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game ChangerCaserta
 
the Data World Distilled
the Data World Distilledthe Data World Distilled
the Data World DistilledRTTS
 
Exploring the Wider World of Big Data- Vasalis Kapsalis
Exploring the Wider World of Big Data- Vasalis KapsalisExploring the Wider World of Big Data- Vasalis Kapsalis
Exploring the Wider World of Big Data- Vasalis KapsalisNetAppUK
 
Big Data 視覺化分析解決方案
Big Data 視覺化分析解決方案Big Data 視覺化分析解決方案
Big Data 視覺化分析解決方案Etu Solution
 
Analytic Platforms in the Real World with 451Research and Calpont_July 2012
Analytic Platforms in the Real World with 451Research and Calpont_July 2012Analytic Platforms in the Real World with 451Research and Calpont_July 2012
Analytic Platforms in the Real World with 451Research and Calpont_July 2012Calpont Corporation
 

Similar to Scaling Integration with Talend's Open Source Platform (20)

Anexinet Big Data Solutions
Anexinet Big Data SolutionsAnexinet Big Data Solutions
Anexinet Big Data Solutions
 
Incorporating the Data Lake into Your Analytic Architecture
Incorporating the Data Lake into Your Analytic ArchitectureIncorporating the Data Lake into Your Analytic Architecture
Incorporating the Data Lake into Your Analytic Architecture
 
Introduction To Big Data & Hadoop
Introduction To Big Data & HadoopIntroduction To Big Data & Hadoop
Introduction To Big Data & Hadoop
 
Hadoop as data refinery
Hadoop as data refineryHadoop as data refinery
Hadoop as data refinery
 
Hadoop as Data Refinery - Steve Loughran
Hadoop as Data Refinery - Steve LoughranHadoop as Data Refinery - Steve Loughran
Hadoop as Data Refinery - Steve Loughran
 
Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which
 
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenariosThe Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
 
BAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, SydneyBAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, Sydney
 
Big Data in Azure
Big Data in AzureBig Data in Azure
Big Data in Azure
 
Derfor skal du bruge en DataLake
Derfor skal du bruge en DataLakeDerfor skal du bruge en DataLake
Derfor skal du bruge en DataLake
 
Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02
Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02
Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02
 
Creating a Next-Generation Big Data Architecture
Creating a Next-Generation Big Data ArchitectureCreating a Next-Generation Big Data Architecture
Creating a Next-Generation Big Data Architecture
 
Big Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RKBig Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RK
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
 
Apache hadoop bigdata-in-banking
Apache hadoop bigdata-in-bankingApache hadoop bigdata-in-banking
Apache hadoop bigdata-in-banking
 
5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game Changer5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game Changer
 
the Data World Distilled
the Data World Distilledthe Data World Distilled
the Data World Distilled
 
Exploring the Wider World of Big Data- Vasalis Kapsalis
Exploring the Wider World of Big Data- Vasalis KapsalisExploring the Wider World of Big Data- Vasalis Kapsalis
Exploring the Wider World of Big Data- Vasalis Kapsalis
 
Big Data 視覺化分析解決方案
Big Data 視覺化分析解決方案Big Data 視覺化分析解決方案
Big Data 視覺化分析解決方案
 
Analytic Platforms in the Real World with 451Research and Calpont_July 2012
Analytic Platforms in the Real World with 451Research and Calpont_July 2012Analytic Platforms in the Real World with 451Research and Calpont_July 2012
Analytic Platforms in the Real World with 451Research and Calpont_July 2012
 

More from Caserta

Using Machine Learning & Spark to Power Data-Driven Marketing
Using Machine Learning & Spark to Power Data-Driven MarketingUsing Machine Learning & Spark to Power Data-Driven Marketing
Using Machine Learning & Spark to Power Data-Driven MarketingCaserta
 
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...Caserta
 
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017Caserta
 
General Data Protection Regulation - BDW Meetup, October 11th, 2017
General Data Protection Regulation - BDW Meetup, October 11th, 2017General Data Protection Regulation - BDW Meetup, October 11th, 2017
General Data Protection Regulation - BDW Meetup, October 11th, 2017Caserta
 
Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...
Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...
Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...Caserta
 
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing KeynoteArchitecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing KeynoteCaserta
 
Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)Caserta
 
Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017
Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017
Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017Caserta
 
The Rise of the CDO in Today's Enterprise
The Rise of the CDO in Today's EnterpriseThe Rise of the CDO in Today's Enterprise
The Rise of the CDO in Today's EnterpriseCaserta
 
Building a New Platform for Customer Analytics
Building a New Platform for Customer Analytics Building a New Platform for Customer Analytics
Building a New Platform for Customer Analytics Caserta
 
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016Caserta
 
You're the New CDO, Now What?
You're the New CDO, Now What?You're the New CDO, Now What?
You're the New CDO, Now What?Caserta
 
The Data Lake - Balancing Data Governance and Innovation
The Data Lake - Balancing Data Governance and Innovation The Data Lake - Balancing Data Governance and Innovation
The Data Lake - Balancing Data Governance and Innovation Caserta
 
Making Big Data Easy for Everyone
Making Big Data Easy for EveryoneMaking Big Data Easy for Everyone
Making Big Data Easy for EveryoneCaserta
 
Benefits of the Azure Cloud
Benefits of the Azure CloudBenefits of the Azure Cloud
Benefits of the Azure CloudCaserta
 
Big Data Analytics on the Cloud
Big Data Analytics on the CloudBig Data Analytics on the Cloud
Big Data Analytics on the CloudCaserta
 
Intro to Data Science on Hadoop
Intro to Data Science on HadoopIntro to Data Science on Hadoop
Intro to Data Science on HadoopCaserta
 
The Emerging Role of the Data Lake
The Emerging Role of the Data LakeThe Emerging Role of the Data Lake
The Emerging Role of the Data LakeCaserta
 
Not Your Father's Database by Databricks
Not Your Father's Database by DatabricksNot Your Father's Database by Databricks
Not Your Father's Database by DatabricksCaserta
 
Mastering Customer Data on Apache Spark
Mastering Customer Data on Apache SparkMastering Customer Data on Apache Spark
Mastering Customer Data on Apache SparkCaserta
 

More from Caserta (20)

Using Machine Learning & Spark to Power Data-Driven Marketing
Using Machine Learning & Spark to Power Data-Driven MarketingUsing Machine Learning & Spark to Power Data-Driven Marketing
Using Machine Learning & Spark to Power Data-Driven Marketing
 
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...
 
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
 
General Data Protection Regulation - BDW Meetup, October 11th, 2017
General Data Protection Regulation - BDW Meetup, October 11th, 2017General Data Protection Regulation - BDW Meetup, October 11th, 2017
General Data Protection Regulation - BDW Meetup, October 11th, 2017
 
Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...
Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...
Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...
 
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing KeynoteArchitecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
 
Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)
 
Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017
Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017
Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017
 
The Rise of the CDO in Today's Enterprise
The Rise of the CDO in Today's EnterpriseThe Rise of the CDO in Today's Enterprise
The Rise of the CDO in Today's Enterprise
 
Building a New Platform for Customer Analytics
Building a New Platform for Customer Analytics Building a New Platform for Customer Analytics
Building a New Platform for Customer Analytics
 
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
 
You're the New CDO, Now What?
You're the New CDO, Now What?You're the New CDO, Now What?
You're the New CDO, Now What?
 
The Data Lake - Balancing Data Governance and Innovation
The Data Lake - Balancing Data Governance and Innovation The Data Lake - Balancing Data Governance and Innovation
The Data Lake - Balancing Data Governance and Innovation
 
Making Big Data Easy for Everyone
Making Big Data Easy for EveryoneMaking Big Data Easy for Everyone
Making Big Data Easy for Everyone
 
Benefits of the Azure Cloud
Benefits of the Azure CloudBenefits of the Azure Cloud
Benefits of the Azure Cloud
 
Big Data Analytics on the Cloud
Big Data Analytics on the CloudBig Data Analytics on the Cloud
Big Data Analytics on the Cloud
 
Intro to Data Science on Hadoop
Intro to Data Science on HadoopIntro to Data Science on Hadoop
Intro to Data Science on Hadoop
 
The Emerging Role of the Data Lake
The Emerging Role of the Data LakeThe Emerging Role of the Data Lake
The Emerging Role of the Data Lake
 
Not Your Father's Database by Databricks
Not Your Father's Database by DatabricksNot Your Father's Database by Databricks
Not Your Father's Database by Databricks
 
Mastering Customer Data on Apache Spark
Mastering Customer Data on Apache SparkMastering Customer Data on Apache Spark
Mastering Customer Data on Apache Spark
 

Recently uploaded

Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 

Recently uploaded (20)

Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 

Scaling Integration with Talend's Open Source Platform

  • 1. The Big Data Ecosystem Talend & Caserta Concepts Webinar Ciaran Dynes Director, Product Management & Product Marketing, Talend Joe Caserta Founder & President, Caserta Concepts
  • 2. Integration at Any Scale Talend is the only integration vendor that enables your business to scale through: An open source-based solution supported by a vast community and enterprise-class services An innovative, unified platform that scales data, application and business processes of any complexity A usage-based subscription model delivering $ a fast return on investment
  • 3. Talend - Integration at Any Scale Talend offers true scalability for • Any integration challenge • Any data volume • Any project size Talend enables integration convergence
  • 4. Working with Leading Vendors Platforms/Hadoop Appliance NoSQL Data Management Analytics System Integrators System Integrators play a vital role in providing expertise
  • 5. The Big Data Ecosystem Talend & Caserta Concepts Webinar Joe Caserta Founder & President, Caserta Concepts Ciaran Dynes Director, Product Management & Product Marketing, Talend
  • 6. Joe Caserta Timeline 2012 Partnered with Big Data vendors Laser focus on Big Data solutions for Cloudera, HortonWorks, Datameer, Financial Sector & eCommerce more… 2010 Formalized Talend Alliance 2009 Partnership – System Integrators Launched Big Data practice 2004 Co-author, with Ralph Kimball, The Launched Training practice, teaching Data Warehouse ETL Toolkit (Wiley) data concepts world-wide 2001 Web log analytics solution published Founded Caserta Concepts in NYC in Intelligent Enterprise 1996 Began consulting career as Dedicated to Data Warehousing, programmer/data modeler Business Intelligence since 1996 1986 25+ years hands-on experience building database solutions
  • 7. Caserta Concepts • Technology services company with expertise in data analysis: • Data Management • Big Data & Analytics • With core focus in the following industries: • Financial Services • Insurance / Healthcare • eCommerce / Higher Education • Established in 2001: • Increased growth year-over-year • Industry recognized work force • Consulting, Writing, Education
  • 8. Expertise & Offerings Strategic Roadmap/ Assessment/Consulting Big Data Analytics Data Warehousing/ ETL/Data Integration BI/Visualization/ Analytics Master Data Management
  • 9. Client Portfolio Finance & Insurance Retail/eCommerce & Manufacturing Education & Services
  • 10. The Good Old Days: Traditional Data Warehousing Metadata Standard Reports Web Logs Ad-hoc Query Tools External Extract Data Sources Optimized Load Transform Data Mining Data Warehouse Relational Systems/ERP MDD/OLAP Closed-loop Legacy feedback Analytical Applications Systems applications Data Marts (The data warehouse?)
  • 11. What is “Big Data”? • A collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications. • Challenges include capture, storage, search, sharing, transfer, analysis, and visualization. • Relational databases were designed for applications, we use only a small fraction of their capabilities in analytics applications. • Enforcing a relational structure upon our data is not always what we want.
  • 12. What’s the Difference? Traditional Data Big Data Very accurate transactional data. Lots of data with value that can Analyzed by humans only be attained by deep analytics Measured in terabytes Measured in petabytes Structured data Structured/Unstructured data Input by human “system users” Created by everybody, plus all of our machine friends Oracle, SAP, etc. Open source, Hadoop HW/SW investment measured in HW/SW investment measured in $10M $10K Recording facts Harvesting insights
  • 13. Try to keep up: This slide is already obsolete
  • 14. So where does the data warehouse come in? • Will Big Data replace the data warehouse? • Yes – however there is much evolution ahead: real time integrations, interactive queries • Data Warehousing principles still apply to Big Data • Data Quality • Master Data • Data architecture • How do we leverage our existing investment?
  • 15. Enterprise Technical Ecosystem Traditional BI ERP ETL Traditional EDW Finance Ad-Hoc/Canned ETL Reporting Legacy Big Data Cluster Big Data BI NoSQL Database Cassandra Search/Data Analytics Mahout MapReduce Pig/Hive N1 N2 N3 N4 N5 Hadoop Distributed File System (HDFS) Horizontally Scalable Environment - Optimized for Analytics Canned Reporting
  • 16. Extending EDW with Hadoop •Eliminate barrier of imposing relational structure on data. •Storage is fast, durable and cheap: Don’t throw away data that can be valuable in the future •Processing power • Hadoop scales linearly, don’t worry about the data set getting too big •Machine learning •Ad-Hoc reporting by non-technical users requires traditional methods or additional application
  • 17. Design Pattern #1: Hadoop Staging/Warehouse feed relational EDW (Composite Warehouse) • Hadoop serves as the staging ground for all data - Eliminate barrier of imposing relational structure on data. - Storage is fast, durable and cheap: Don’t throw away data that can be valuable in the future • Data scientists will work in the Hadoop environment to analyze, and mine structured and unstructured data using Pig, Hive, and Mahout (machine learning) • Data required for interactive reporting and traditional ad-hoc analysis is sent to downstream relational EDW Source Systems Mahout MapReduce Pig/Hive Traditional DW N1 N2 N3 N4 N5 Hadoop Distributed File System (HDFS)
  • 18. Design Pattern #2: NoSQL Enhanced EDW •Not all structured data lends itself to being stored relationally: • Relationships: Graph Databases • Sparse Data: Columnar Databases •Very Large Datasets: • NoSQL databases are capable of scaling far beyond relational databases while maintaining performance • Ultra-performance key value stores and columnar databases can be very useful in storing certain types of high volume data for analytic purposes • Just don’t expect the ad-hoc flexibility of a relational database! - Web analytics Mahout MapReduce Pig/Hive Cassandra - Ad Impressions (columnar) N1 N2 N3 N4 N5 Hadoop Distributed File System (HDFS) - Networks Titan - Recommender (graph) - Path optimization Traditional DW
  • 19. Design Pattern #3: Add analytics to your NoSQL cluster • If your application is already based on a NoSQL technology, consider building analytic site. • The analytic site is constantly streamed fresh transactions leveraging Cassandra's native replication • Aggregates and analytic views are materialized with Pig/Hive map/reduce, since the work is done on the cluster no load is placed on the applications. This analytic data is in turn replicated throughout the cluster Site 1 Cassandra Pig/Hive Cassandra MapReduce Analytics Site Site 2 Canned Reporting Cassandra Remember, NoSQL schemas are Traditional “optimized to a DW query”, not ad-hoc
  • 20. Emerging Tools Hive, although an excellent tool for data analysis is too slow for interactive queries. Recent projects have increased speed dramatically 10-100x. • Google Dremel • Apache/MapR Drill • Hortonworks Stinger • Cloudera Impala
  • 21. Commonly Used Technologies • Amazon Elastic MapReduce (EMR): Web service to access EC2/S3, pay-as- you-go hosted Hadoop Infrastructure • Hadoop Distribution: Cloudera; MapR; Hortonworks • Apache Projects • Whirr: Used to launch/kill computing clusters • Kafka: Publish-subscribe messaging system • Mahout: Distributed machine learning • Hive: Map data to structures and use SQL-like queries • HBase: No-SQL/non-relational database, real-time read/write • Cassandra: Like HBase, no single point of failure • Chuckwa/Flume: Large-scale log collection • Pig: Procedural programming language, from Yahoo • Sqoop: “SQL-to-Hadoop”, like BCP for Hadoop • Zookeeper: Used to manage & adminster Hadoop • Solr: Full-text/Faceted Search • MongoDB: Document-oriented database • Languages: Python, SciPy, Java
  • 22. Leading Vendors (According to Joe) Hadoop NoSQL Analytics Data Management
  • 23. Parting Thought Polyglot Persistence – “where any decent sized enterprise will have a variety of different data storage technologies for different kinds of data. There will still be large amounts of it managed in relational stores, but increasingly we'll be first asking how we want to manipulate the data and only then figuring out what technology is the best bet for it.” -- Martin Fowler
  • 24. Questions? Please ask your questions now using the Q&A panel
  • 25. Resources ➜ Recording will be made available on www.talend.com/resources/webinars ➜ Request a copy of the slides webinar@talend.com ➜ Contact Talend Sales • Email: sales@talend.com • Phone: 714.786.8140 ➜ Contact Caserta Concepts • Joe Caserta, President • Email: joe@casertaconcepts.com • Phone: 855.755.2246 x227 © Talend 2012

Editor's Notes

  1. Purpose of the slide: Mission / Vision StatementKey themes:Talend’s mission is to enable our customers to innovate faster at a lower cost.We are disrupting the traditional integration market by delivering an: open source-based solution, innovative unified platform, usage-based subscription modelMore from the Talend boilerplate:Talend provides integration that truly scales. From small projects to enterprise-wide implementations, Talend’s highly scalable data, application and business process integration platform maximizes the value of an organization’s information assets and optimizes return on investment through a usage-based subscription model. Ready for big data environments, Talend’s flexible architecture easily adapts to future IT platforms. And a common set of easy-to-use tools implemented across all Talend products enable teams to scale developer skillsets, too.
  2. Purpose of the slide: IntroduceTalend’s solution – Integration At Any ScaleTalking points:Talend is disrupting the integration market to address these integration challenges by providing a differentiated solution that provides “Integration at Any Scale”With Talend, your business can scale to meet any integration challenge, any data volume, or any project size.We will discuss HOW this is done in a moment, but the main point here is what we call “Integration Convergence”Integration Convergence is the ability to address data, application and process integration needs with the same platformThe benefit to you, is that your resources are more efficient and you lower your cost of operationsTalend provides integration that truly scales. From small projects to enterprise-wide implementations, Talend’s highly scalable data, application and business process integration platform maximizes the value of an organization’s information assets and optimizes return on investment through a usage-based subscription model. Ready for big data environments, Talend’s flexible architecture easily adapts to future IT platforms.
  3. Endeca bought by Oracle – “agile information management”SSPS bought by IBMRadian6 bought by SalesforceDataStax – cassandraKarmasphere – data analysis platform for HadoopCouchbase – NoSQL – Membase and CouchbaseClarabridge – text analytics
  4. Alternative NoSQL: Hbase, Cassandra, Druid, VoltDB
  5. Endeca bought by Oracle – “agile information management”SSPS bought by IBMRadian6 bought by SalesforceDataStax – cassandraKarmasphere – data analysis platform for HadoopCouchbase – NoSQL – Membase and CouchbaseClarabridge – text analytics