SlideShare ist ein Scribd-Unternehmen logo
1 von 81
Big data beyond Hadoop –
How to integrate ALL your data
Kai Wähner
kwaehner@talend.com
@KaiWaehner
www.kai-waehner.de
9/24/2013
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Consulting
Developing
Coaching
Speaking
Writing
Main Tasks
Requirements Engineering
Enterprise Architecture Management
Business Process Management
Architecture and Development of Applications
Service-oriented Architecture
Integration of Legacy Applications
Cloud Computing
Big Data
Contact
Email: kontakt@kai-waehner.de
Blog: www.kai-waehner.de/blog
Twitter: @KaiWaehner
Social Networks: Xing, LinkedIn
Kai Wähner
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Key messages
You have to care about big data to be competitive in the future!
You have to integrate different sources to get most value out of it!
Big data integration is no (longer) rocket science!
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
• Big data paradigm shift
• Challenges of big data
• Big data from a technology perspective
• Integration with an open source framework
• Integration with an open source suite
• Custom big data components
Agenda
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
• Big data paradigm shift
• Challenges of big data
• Big data from a technology perspective
• Integration with an open source framework
• Integration with an open source suite
• Custom big data components
Agenda
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
William Edwards Deming
(1900 –1993)
American statistician, professor,
author, lecturer and consultant
“If you can't measure it,
you can't manage it.”
Why should you care about big data?
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
 „Silence the HiPPOs“ (highest-paid person‘s opinion)
 Being able to interpret unimaginable large data
stream, the gut feeling is no longer justified!
Why should you care about big data?
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
What is big data? The Vs of big data
Volume
(terabytes,
petabytes)
Variety
(social networks,
blog posts, logs,
sensors, etc.)
Velocity
(realtime or near-
realtime)
Value
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Big Data Integration
– Land data in a Big Data cluster
– Implement or generate parallel processes
Big Data Manipulation
– Simplify manipulation, such as sort and filter
– Computational expensive functions
Big Data Quality & Governance
– Identify linkages and duplicates, validate big data
– Match component, execute basic quality features
Big Data Project Management
– Place frameworks around big data projects
– Common Repository, scheduling, monitoring
Big data tasks to solve - before analysis
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
“The advantage of their new system is that they can now look at their data
[from their log processing system] in anyway they want:
➜ Nightly MapReduce jobs collect statistics about their mail system such as
spam counts by domain, bytes transferred and number of logins.
➜ When they wanted to find out which part of the world their customers
logged in from, a quick [ad hoc] MapReduce job was created and they had
the answer within a few hours. Not really possible in your typical ETL
system.”
http://highscalability.com/how-rackspace-now-uses-mapreduce-and-hadoop-query-terabytes-data
Use case: Clickstream Analysis
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
http://hkotadia.com/archives/5021
Deduce
Customer
Defections
Use case: Risk management
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
➜ With revenue of almost USD 30 billion and a network of
800 locations, Macy's is considered the largest store operator in the
USA
➜ Daily price check analysis of its 10,000 articles in less than two hours
➜ Whenever a neighboring competitor anywhere between New York
and Los Angeles goes for aggressive price reductions, Macy's follows
its example
➜ If there is no market competitor, the prices remain unchanged
http://www.t-systems.com/about-t-systems/examples-of-successes-companies-analyze-big-data-in-record-time-l-t-systems/1029702
Use case: Flexible pricing
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
➜ A lot of data must be stored „forever“
➜ Numbers increase exponentially
➜ Goal: As cheap as possible
➜ Problem: (Fast) queries must still be possible
➜ Solution: Commodity servers and „Hadoop querying“
Global Parcel Service
http://archive.org/stream/BigDataImPraxiseinsatz-SzenarienBeispieleEffekte/Big_Data_BITKOM-Leitfaden_Sept.2012#page/n0/mode/2up
Storage: Compliance
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
• Big data paradigm shift
• Challenges of big data
• Big data from a technology perspective
• Integration with an open source framework
• Integration with an open source suite
• Custom big data components
Agenda
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
This is your
company
Big Data Geek
Limited big data experts
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Big Data + Poor Data Quality = Big Problems
Data quality
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
➜ Wanna buy a big data solution for your industry?
➜ Maybe a competitor has a big data solution which
adds business value?
➜ The competitor will never publish it (rat-race)!
Big data tool selection (business perspective)
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Looking for ‚your‘ required big data product?
Support your data from scratch?
Good luck! 
Big data tool selection (technical perspective)
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
How to solve these big data challenges?
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
 “*Often+ simple models and
big data trump more-elaborate
[and complex] analytics approaches”
 “Often someone coming from
outside an industry can spot
a better way to use big data
than an insider”
Erik Brynjolfsson / Lynn Wu
http://alfredopassos.tumblr.com/post/32461599327/big-data-the-management-revolution-by-andrew-mcafee
Be no expert! Be simple!
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
 Look at use cases of others
(SMU, but also large companies)
 How can you do something similar
with your data?
 You have different data sources?
Use it! Combine it! Play with it!
Be creative!
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
1) Do not begin with the data, think about business opportunities
2) Choose the right data (combine different data sources)
3) Use easy tooling
http://hbr.org/2012/10/making-advanced-analytics-work-for-you
What is your Big Data process?
Step 1 Step 2 Step 3
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
• Big data paradigm shift
• Challenges of big data
• Big data from a technology perspective
• Integration with an open source framework
• Integration with an open source suite
• Custom big data components
Agenda
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Technology perspective
How to process big data?
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
The critical flaw in parallel ETL tools is the fact that the data is almost never local to the processing
nodes. This means that every time a large job is run, the data has to first be read from the source,
split N ways and then delivered to the individual nodes. Worse, if the partition key of the source
doesn’t match the partition key of the target, data has to be constantly exchanged among the
nodes. In essence, parallel ETL treats the network as if it were a physical I/O subsystem. The
network, which is always the slowest part of the process, becomes the weakest link in the
performance chain.
http://blog.syncsort.com/2012/08/parallel-etl-tools-are-dead
How to process big data?
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Slides: http://www.slideshare.net/pavlobaron/100-big-data-0-hadoop-0-java
Video: http://www.infoq.com/presentations/Big-Data-Hadoop-Java
How to process big data?
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
The defacto standard for big data processing
How to process big data?
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Even Microsoft (the .NET house) relies on Hadoop since 2011
How to process big data?
“A big part of [the
company’s strategy+
includes wiring SQL Server
2012 (formerly known by
the codename “Denali”) to
the Hadoop distributed
computing platform, and
bringing Hadoop to
Windows Server and Azure”
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Apache Hadoop, an open-source software library, is a
framework that allows for the distributed processing of
large data sets across clusters of commodity hardware
using simple programming models. It is designed to scale
up from single servers to thousands of machines, each
offering local computation and storage.
What is Hadoop?
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Simple example
• Input: (very large) text files with lists of strings, such as:
„318, 0043012650999991949032412004...0500001N9+01111+99999999999...“
• We are interested just in some content: year and temperate (marked in red)
• The Map Reduce function has to compute the maximum temperature for every year
Example from the book “Hadoop: The Definitive Guide, 3rd Edition”
Map (Shuffle) Reduce
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
How to process big data?
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
• Big data paradigm shift
• Challenges of big data
• Big data from a technology perspective
• Integration with an open source framework
• Integration with an open source suite
• Custom big data components
Agenda
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Connectivity
Routing
Transformation
Complexity
of Integration
Enterprise
Service Bus
Integration Suite
Low High
Integration
Framework
INTEGRATION
Tooling
Monitoring
Support
+
BUSINESS PROCESS MGT.
BIG DATA / MDM
REGISTRY / REPOSITORY
RULES ENGINE
„YOU NAME IT“
+
Alternatives for systems integration
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Complexity
of Integration
Enterprise
Service Bus
Integration Suite
Low High
Integration
Framework
Alternatives for systems integration
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
More details about integration frameworks...
http://www.kai-waehner.de/blog/2012/12/20/showdown-integration-framework-
spring-integration-apache-camel-vs-enterprise-service-bus-esb/
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
More details about integration frameworks...
... or you come to my JavaOne session tomorrow
and see an updated version of the slides!
Wednesday, 11:30 – 12:30 PM
CON1934: Which Integration Framework to Choose?
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Enterprise Integration Patterns (EIP)
Apache Camel
Implements the EIPs
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Enterprise Integration Patterns (EIP)
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Enterprise Integration Patterns (EIP)
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Architecture
http://java.dzone.com/articles/apache-camel-integration
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
HTTP
FTP
File
XSLT
MQ
JDBC
Akka
TCP
SMTP
RSS
Quartz
Log
LDAP
JMS
EJB
AMQP
Atom
AWS-S3
Bean-Validation
CXF
IRC
Jetty
JMX
Lucene
Netty
RMI
SQL
Many many more Custom Components
Choose your required components
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Choose your favorite DSL
XML
(not production-ready yet)
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Deploy it wherever you need
Standalone
OSGi
Application Server
Web Container
Spring Container
Cloud
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Enterprise-ready
• Open Source
• Scalability
• Error Handling
• Transaction
• Monitoring
• Tooling
• Commercial Support
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Example: Camel integration route
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Hadoop Integration with Apache Camel
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
camel-hdfs component
// Producer
from("ftp://user@myServer?password=secret")
.to(“hdfs:///myDirectory/myFile.txt?append=true");
// Consumer
from(“hdfs:///myDirectory/myBigDataAnalysis.csv")
.to(“file:target/reports/report.csv");
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
camel-hbase component
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
➜ A lot of data must be stored „forever“
➜ Numbers increase exponentially
➜ Goal: As cheap as possible
➜ Problem: (Fast) queries must still be possible
➜ Solution: Commodity servers and „Hadoop querying“
Global Parcel Service
http://archive.org/stream/BigDataImPraxiseinsatz-SzenarienBeispieleEffekte/Big_Data_BITKOM-Leitfaden_Sept.2012#page/n0/mode/2up
Real World Use Case: Storage: Compliance
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Real world use case: Storage: Compliance
Orders
(Server 1)
Log Files
(Server 3)
Log Files
(Server 100)
ETL
QueryStorage
Payments
(Server 2)
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Live demo
Apache Camel in action...
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
camel-pig? camel-hive? camel-hcatalog?
Not available yet (current Camel version: 2.12)  Workarounds:
• Use Pig / Hive-Query scripts (via camel-exec component or any
scripting language)
• Build your own component (more details later ...)
• Use Hive-Hbase-Integration and store data in HBase ( „ugly“)
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
camel-avro component
... Avro, a data serialization system used in Apache Hadoop. camel-avro component
provides a dataformat for Avro, which allows serialization and deserialization of
messages using Apache Avro's binary data format. Moreover, it provides support for
Apache Avro's RPC, by providing producers and consumers endpoint for using Avro
over Netty or HTTP.
Camel is not just about connectors ...
Camel supports a pluggable DataFormat to allow messages to be marshalled to and
unmarshalled from binary or text formats, e.g. CSV, JSON, SOAP, EDI, ZIP, or ...
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
• Big data paradigm shift
• Challenges of big data
• Big data from a technology perspective
• Integration with an open source framework
• Integration with an open source suite
• Custom big data components
Agenda
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Connectivity
Routing
Transformation
Complexity
of Integration
Enterprise
Service Bus
Integration Suite
Low High
Integration
Framework
INTEGRATION
Tooling
Monitoring
Support
+
BUSINESS PROCESS MGT.
BIG DATA / MDM
REGISTRY / REPOSITORY
RULES ENGINE
„YOU NAME IT“
+
Alternatives for systems integration
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Complexity
of Integration
Enterprise
Service Bus
Integration Suite
Low High
Integration
Framework
Alternatives for systems integration
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
More details about ESBs and suites...
http://www.kai-waehner.de/blog/2013/01/23/spoilt-for-choice-
how-to-choose-the-right-enterprise-service-bus-esb/
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Hadoop Integration with Talend Open Studio
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
…an open source
ecosystem
Talend Open Studio for Big Data
• Improves efficiency of big data job design with
graphic interface
• Generates Hadoop code and run transforms
inside Hadoop
• Native support for HDFS, Pig, Hbase, Hcatalog,
Sqoop and Hive
• 100% open source under an Apache License
• Standards based
Pig
Vision: Democratize big data
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
…an open source
ecosystem
Talend Platform for Big Data
• Builds on Talend Open Studio for Big Data
• Adds data quality, advanced scalability and
management functions
• MapReduce massively parallel data
processing
• Shared Repository and remote deployment
• Data quality and profiling
• Data cleansing
• Reporting and dashboards
• Commercial support, warranty/IP indemnity
under a subscription license
Pig
Vision: Democratize big data
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Talend Open Studio for Big Data
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
“The advantage of their new system is that they can now look at their data
[from their log processing system] in anyway they want:
➜ Nightly MapReduce jobs collect statistics about their mail system such as
spam counts by domain, bytes transferred and number of logins.
➜ When they wanted to find out which part of the world their customers
logged in from, a quick [ad hoc] MapReduce job was created and they had
the answer within a few hours. Not really possible in your typical ETL
system.”
http://highscalability.com/how-rackspace-now-uses-mapreduce-and-hadoop-query-terabytes-data
Real world Use case: Clickstream Analysis
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Real world use case: Clickstream Analysis
Log Files
(Server 1)
Log Files
(Server 2)
Log Files
(Server 100)
ETL
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
One of the original uses of Hadoop at Yahoo was to store and process their massive volume of
clickstream data. Now enterprises of all types can use Hadoop to refine and analyze
clickstream data. They can then answer business questions such as:
• What is the most efficient path for a site visitor to research a product, and then buy it?
• What products do visitors tend to buy together, and what are they most likely to buy in
the future?
• Where should I spend resources on fixing or enhancing the user experience on my
website?
Goal: Data visualization can help you optimize your website and
convert more visits into sales and revenue.
Potential Uses of Clickstream Data
Source: for Clickstream Example: „Hortonworks Hadoop Tutorials - Real Life Use Cases”
http://hortonworks.com/blog/hadoop-tutorials-real-life-use-cases-in-the-sandbox
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Example: A semi-structured log file
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Example: ETL Job
„... using Talend’s HDFS and Hive Components”
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Example: ETL Job
„... using Talend’s Map Reduce Components*”
* Not available in open source version of Talend Studio
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
„Talend Open Studio for Big Data“ in action...
Live demo
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Example: Analysis with Microsoft Excel
We can see that the largest number of page hits in Florida were for
clothing, followed by shoes.
Source: for Clickstream Example: „Hortonworks Hadoop Tutorials - Real Life Use Cases”
http://hortonworks.com/blog/hadoop-tutorials-real-life-use-cases-in-the-sandbox
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Example: Analysis with Microsoft Excel
The chart shows that the majority of men shopping for clothing on our
website are between the ages of 22 and 30. With this information, we can
optimize our content for this market segment.
Source: for Clickstream Example: „Hortonworks Hadoop Tutorials - Real Life Use Cases”
http://hortonworks.com/blog/hadoop-tutorials-real-life-use-cases-in-the-sandbox
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Example: Analysis with Tableau
Spoilt for Choice  Use your preferred BI or Analysis tool!
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
• Big data paradigm shift
• Challenges of big data
• Big data from a technology perspective
• Integration with an open source framework
• Integration with an open source suite
• Custom big data components
Agenda
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Custom components
Easy to realize for all
integration alternatives *
• Integration Framework
• Enterprise Service Bus
• Integration Suite
* At least for open source solutions
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Custom components
You might need a ...
• ... Hive component for Camel
• ... Impala component for Talend
• ... custom component for your
internal data format
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Live demo (Example: Apache Camel)
Custom components in action...
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Alternative for custom components
• SOAP
• REST
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Code example: REST API for Salesforce object store
// Salesforce Query (SOQL) via REST API
from("direct:salesforceViaHttpLIST")
.setHeader("X-PrettyPrint", 1)
.setHeader("Authorization", accessToken)
.setHeader(Exchange.CONTENT_TYPE, "application/json")
.to("https://na14.salesforce.com/services/data/v20.0/query?q=SELECT+name+from
+Article__c")
// Salesforce CREATE via REST API
from("direct:salesforceViaHttpCREATE")
.setHeader("X-PrettyPrint", 1)
.setHeader("Authorization", accessToken)
.setHeader(Exchange.CONTENT_TYPE, "application/json“)
.to("https://na14.salesforce.com/services/data/v20.0/sobjects/Article__c")
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Did you get the key message?
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Key messages
You have to care about big data to be competitive in the future!
You have to integrate different sources to get most value out of it!
Big data integration is no (longer) rocket science!
© Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Did you get the key message?
Thank you for your attention. Questions?
kwaehner@talend.com
www.kai-waehner.de
LinkedIn / Xing
@KaiWaehner

Weitere ähnliche Inhalte

Was ist angesagt?

VisiQuate: Azure cloud migration case study
VisiQuate: Azure cloud migration case studyVisiQuate: Azure cloud migration case study
VisiQuate: Azure cloud migration case studyLeonid Nekhymchuk
 
A Journey to a Serverless Business Intelligence, Machine Learning and Big Dat...
A Journey to a Serverless Business Intelligence, Machine Learning and Big Dat...A Journey to a Serverless Business Intelligence, Machine Learning and Big Dat...
A Journey to a Serverless Business Intelligence, Machine Learning and Big Dat...DataWorks Summit
 
Why Migrate from MySQL to Cassandra
Why Migrate from MySQL to CassandraWhy Migrate from MySQL to Cassandra
Why Migrate from MySQL to CassandraDATAVERSITY
 
Real World Use Cases and Success Stories for In-Memory Data Grids (TIBCO Acti...
Real World Use Cases and Success Stories for In-Memory Data Grids (TIBCO Acti...Real World Use Cases and Success Stories for In-Memory Data Grids (TIBCO Acti...
Real World Use Cases and Success Stories for In-Memory Data Grids (TIBCO Acti...Kai Wähner
 
R, Spark, Tensorflow, H20.ai Applied to Streaming Analytics
R, Spark, Tensorflow, H20.ai Applied to Streaming AnalyticsR, Spark, Tensorflow, H20.ai Applied to Streaming Analytics
R, Spark, Tensorflow, H20.ai Applied to Streaming AnalyticsKai Wähner
 
Driving Business Insights with a Modern Data Architecture AWS Summit SG 2017
Driving Business Insights with a Modern Data Architecture  AWS Summit SG 2017Driving Business Insights with a Modern Data Architecture  AWS Summit SG 2017
Driving Business Insights with a Modern Data Architecture AWS Summit SG 2017Amazon Web Services
 
An Introduction to Talend Integration Cloud
An Introduction to Talend Integration CloudAn Introduction to Talend Integration Cloud
An Introduction to Talend Integration CloudTalend
 
Hadoop for Humans: Introducing SnapReduce 2.0
Hadoop for Humans: Introducing SnapReduce 2.0Hadoop for Humans: Introducing SnapReduce 2.0
Hadoop for Humans: Introducing SnapReduce 2.0SnapLogic
 
Beyond Batch: Is ETL still relevant in the API economy?
Beyond Batch: Is ETL still relevant in the API economy?Beyond Batch: Is ETL still relevant in the API economy?
Beyond Batch: Is ETL still relevant in the API economy?SnapLogic
 
How to Apply Machine Learning with R, H20, Apache Spark MLlib or PMML to Real...
How to Apply Machine Learning with R, H20, Apache Spark MLlib or PMML to Real...How to Apply Machine Learning with R, H20, Apache Spark MLlib or PMML to Real...
How to Apply Machine Learning with R, H20, Apache Spark MLlib or PMML to Real...Kai Wähner
 
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and HadoopGoogle Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoophuguk
 
SnapLogic's Latest Elastic iPaaS Release Adds Hybrid Links for Spark, Cortana...
SnapLogic's Latest Elastic iPaaS Release Adds Hybrid Links for Spark, Cortana...SnapLogic's Latest Elastic iPaaS Release Adds Hybrid Links for Spark, Cortana...
SnapLogic's Latest Elastic iPaaS Release Adds Hybrid Links for Spark, Cortana...SnapLogic
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Cloudera, Inc.
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureDatabricks
 
Building a Big Data & Analytics Platform using AWS
Building a Big Data & Analytics Platform using AWS Building a Big Data & Analytics Platform using AWS
Building a Big Data & Analytics Platform using AWS Amazon Web Services
 
The Impact of SMACT on the Data Management Stack
The Impact of SMACT on the Data Management StackThe Impact of SMACT on the Data Management Stack
The Impact of SMACT on the Data Management StackSnapLogic
 
Trivadis TechEvent 2016 DWH Modernization – in the Age of Big Data by Gregor ...
Trivadis TechEvent 2016 DWH Modernization – in the Age of Big Data by Gregor ...Trivadis TechEvent 2016 DWH Modernization – in the Age of Big Data by Gregor ...
Trivadis TechEvent 2016 DWH Modernization – in the Age of Big Data by Gregor ...Trivadis
 
Cloudera, Azure and Big Data at Cloudera Meetup '17
Cloudera, Azure and Big Data at Cloudera Meetup '17Cloudera, Azure and Big Data at Cloudera Meetup '17
Cloudera, Azure and Big Data at Cloudera Meetup '17Nathan Bijnens
 
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step JourneyWebinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step JourneyDataStax
 

Was ist angesagt? (20)

VisiQuate: Azure cloud migration case study
VisiQuate: Azure cloud migration case studyVisiQuate: Azure cloud migration case study
VisiQuate: Azure cloud migration case study
 
A Journey to a Serverless Business Intelligence, Machine Learning and Big Dat...
A Journey to a Serverless Business Intelligence, Machine Learning and Big Dat...A Journey to a Serverless Business Intelligence, Machine Learning and Big Dat...
A Journey to a Serverless Business Intelligence, Machine Learning and Big Dat...
 
Why Migrate from MySQL to Cassandra
Why Migrate from MySQL to CassandraWhy Migrate from MySQL to Cassandra
Why Migrate from MySQL to Cassandra
 
Real World Use Cases and Success Stories for In-Memory Data Grids (TIBCO Acti...
Real World Use Cases and Success Stories for In-Memory Data Grids (TIBCO Acti...Real World Use Cases and Success Stories for In-Memory Data Grids (TIBCO Acti...
Real World Use Cases and Success Stories for In-Memory Data Grids (TIBCO Acti...
 
R, Spark, Tensorflow, H20.ai Applied to Streaming Analytics
R, Spark, Tensorflow, H20.ai Applied to Streaming AnalyticsR, Spark, Tensorflow, H20.ai Applied to Streaming Analytics
R, Spark, Tensorflow, H20.ai Applied to Streaming Analytics
 
Driving Business Insights with a Modern Data Architecture AWS Summit SG 2017
Driving Business Insights with a Modern Data Architecture  AWS Summit SG 2017Driving Business Insights with a Modern Data Architecture  AWS Summit SG 2017
Driving Business Insights with a Modern Data Architecture AWS Summit SG 2017
 
An Introduction to Talend Integration Cloud
An Introduction to Talend Integration CloudAn Introduction to Talend Integration Cloud
An Introduction to Talend Integration Cloud
 
The API Lie
The API LieThe API Lie
The API Lie
 
Hadoop for Humans: Introducing SnapReduce 2.0
Hadoop for Humans: Introducing SnapReduce 2.0Hadoop for Humans: Introducing SnapReduce 2.0
Hadoop for Humans: Introducing SnapReduce 2.0
 
Beyond Batch: Is ETL still relevant in the API economy?
Beyond Batch: Is ETL still relevant in the API economy?Beyond Batch: Is ETL still relevant in the API economy?
Beyond Batch: Is ETL still relevant in the API economy?
 
How to Apply Machine Learning with R, H20, Apache Spark MLlib or PMML to Real...
How to Apply Machine Learning with R, H20, Apache Spark MLlib or PMML to Real...How to Apply Machine Learning with R, H20, Apache Spark MLlib or PMML to Real...
How to Apply Machine Learning with R, H20, Apache Spark MLlib or PMML to Real...
 
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and HadoopGoogle Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
 
SnapLogic's Latest Elastic iPaaS Release Adds Hybrid Links for Spark, Cortana...
SnapLogic's Latest Elastic iPaaS Release Adds Hybrid Links for Spark, Cortana...SnapLogic's Latest Elastic iPaaS Release Adds Hybrid Links for Spark, Cortana...
SnapLogic's Latest Elastic iPaaS Release Adds Hybrid Links for Spark, Cortana...
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data Architecture
 
Building a Big Data & Analytics Platform using AWS
Building a Big Data & Analytics Platform using AWS Building a Big Data & Analytics Platform using AWS
Building a Big Data & Analytics Platform using AWS
 
The Impact of SMACT on the Data Management Stack
The Impact of SMACT on the Data Management StackThe Impact of SMACT on the Data Management Stack
The Impact of SMACT on the Data Management Stack
 
Trivadis TechEvent 2016 DWH Modernization – in the Age of Big Data by Gregor ...
Trivadis TechEvent 2016 DWH Modernization – in the Age of Big Data by Gregor ...Trivadis TechEvent 2016 DWH Modernization – in the Age of Big Data by Gregor ...
Trivadis TechEvent 2016 DWH Modernization – in the Age of Big Data by Gregor ...
 
Cloudera, Azure and Big Data at Cloudera Meetup '17
Cloudera, Azure and Big Data at Cloudera Meetup '17Cloudera, Azure and Big Data at Cloudera Meetup '17
Cloudera, Azure and Big Data at Cloudera Meetup '17
 
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step JourneyWebinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
 

Andere mochten auch

Clase 1 desordenes emocionales afectivos y conductuales
Clase 1 desordenes emocionales afectivos y conductualesClase 1 desordenes emocionales afectivos y conductuales
Clase 1 desordenes emocionales afectivos y conductualesMildred Merida
 
081024 Com Part Ges Fin Cochabamba
081024 Com Part Ges Fin Cochabamba081024 Com Part Ges Fin Cochabamba
081024 Com Part Ges Fin CochabambaICCO Cooperation
 
XTIANA OMOKHOSE ISEDU
XTIANA OMOKHOSE ISEDUXTIANA OMOKHOSE ISEDU
XTIANA OMOKHOSE ISEDUisedu xtiana
 
Logmatic at ElasticSearch November Paris meetup
Logmatic at ElasticSearch November Paris meetupLogmatic at ElasticSearch November Paris meetup
Logmatic at ElasticSearch November Paris meetuplogmatic.io
 
Presentación puravera foro fehispor
Presentación puravera  foro fehisporPresentación puravera  foro fehispor
Presentación puravera foro fehisporRoy Ortiz
 
Pilares ATLANTIS™ de titanio dorado. Para una mejor salud periimplantaria.
Pilares ATLANTIS™ de titanio dorado. Para una mejor salud periimplantaria.Pilares ATLANTIS™ de titanio dorado. Para una mejor salud periimplantaria.
Pilares ATLANTIS™ de titanio dorado. Para una mejor salud periimplantaria.DENTSPLY Iberia
 
Global Connections in Long-term care - IAHSA 2012
Global Connections in Long-term care - IAHSA 2012Global Connections in Long-term care - IAHSA 2012
Global Connections in Long-term care - IAHSA 2012GERATEC
 
Eventi GEOWEB
Eventi GEOWEBEventi GEOWEB
Eventi GEOWEBGEOWEB
 
Serious Games und Social Media: Ein Zukunftsmarkt
Serious Games und Social Media: Ein ZukunftsmarktSerious Games und Social Media: Ein Zukunftsmarkt
Serious Games und Social Media: Ein ZukunftsmarktJohannes Konert
 
Aacte Junio 2008
Aacte Junio 2008Aacte Junio 2008
Aacte Junio 2008roke
 
CFW Domestic Sprinkler Regulations - Bafsa Fire Sprinkler Wales
CFW Domestic Sprinkler Regulations - Bafsa Fire Sprinkler WalesCFW Domestic Sprinkler Regulations - Bafsa Fire Sprinkler Wales
CFW Domestic Sprinkler Regulations - Bafsa Fire Sprinkler WalesRae Davies
 
Aprendiendo A Ver La Escultura
Aprendiendo A Ver La EsculturaAprendiendo A Ver La Escultura
Aprendiendo A Ver La Esculturacarolinaperez_76
 
Diagrama de Classe: Relacionamento de Composição
Diagrama de Classe: Relacionamento de ComposiçãoDiagrama de Classe: Relacionamento de Composição
Diagrama de Classe: Relacionamento de ComposiçãomarcusNOGUEIRA
 
Syllabus propedéutica y terapéutica ocular ciclo 2 2015
Syllabus propedéutica y terapéutica ocular ciclo 2 2015Syllabus propedéutica y terapéutica ocular ciclo 2 2015
Syllabus propedéutica y terapéutica ocular ciclo 2 2015Universidad Técnica de Manabí
 
Hoja de afiliación
Hoja de afiliaciónHoja de afiliación
Hoja de afiliaciónfontaine18
 
Skills Portfolio 2010
Skills Portfolio 2010Skills Portfolio 2010
Skills Portfolio 2010JacquiBIUK
 

Andere mochten auch (20)

Tobias Ahl - Rala - Sweden Rural & Municipality Broadband
Tobias Ahl - Rala - Sweden Rural & Municipality BroadbandTobias Ahl - Rala - Sweden Rural & Municipality Broadband
Tobias Ahl - Rala - Sweden Rural & Municipality Broadband
 
Clase 1 desordenes emocionales afectivos y conductuales
Clase 1 desordenes emocionales afectivos y conductualesClase 1 desordenes emocionales afectivos y conductuales
Clase 1 desordenes emocionales afectivos y conductuales
 
081024 Com Part Ges Fin Cochabamba
081024 Com Part Ges Fin Cochabamba081024 Com Part Ges Fin Cochabamba
081024 Com Part Ges Fin Cochabamba
 
XTIANA OMOKHOSE ISEDU
XTIANA OMOKHOSE ISEDUXTIANA OMOKHOSE ISEDU
XTIANA OMOKHOSE ISEDU
 
Logmatic at ElasticSearch November Paris meetup
Logmatic at ElasticSearch November Paris meetupLogmatic at ElasticSearch November Paris meetup
Logmatic at ElasticSearch November Paris meetup
 
Presentación puravera foro fehispor
Presentación puravera  foro fehisporPresentación puravera  foro fehispor
Presentación puravera foro fehispor
 
Pilares ATLANTIS™ de titanio dorado. Para una mejor salud periimplantaria.
Pilares ATLANTIS™ de titanio dorado. Para una mejor salud periimplantaria.Pilares ATLANTIS™ de titanio dorado. Para una mejor salud periimplantaria.
Pilares ATLANTIS™ de titanio dorado. Para una mejor salud periimplantaria.
 
Global Connections in Long-term care - IAHSA 2012
Global Connections in Long-term care - IAHSA 2012Global Connections in Long-term care - IAHSA 2012
Global Connections in Long-term care - IAHSA 2012
 
Mito de prometeo
Mito de prometeoMito de prometeo
Mito de prometeo
 
Eventi GEOWEB
Eventi GEOWEBEventi GEOWEB
Eventi GEOWEB
 
Serious Games und Social Media: Ein Zukunftsmarkt
Serious Games und Social Media: Ein ZukunftsmarktSerious Games und Social Media: Ein Zukunftsmarkt
Serious Games und Social Media: Ein Zukunftsmarkt
 
Madres y blogs
Madres y blogsMadres y blogs
Madres y blogs
 
Aacte Junio 2008
Aacte Junio 2008Aacte Junio 2008
Aacte Junio 2008
 
CFW Domestic Sprinkler Regulations - Bafsa Fire Sprinkler Wales
CFW Domestic Sprinkler Regulations - Bafsa Fire Sprinkler WalesCFW Domestic Sprinkler Regulations - Bafsa Fire Sprinkler Wales
CFW Domestic Sprinkler Regulations - Bafsa Fire Sprinkler Wales
 
Aprendiendo A Ver La Escultura
Aprendiendo A Ver La EsculturaAprendiendo A Ver La Escultura
Aprendiendo A Ver La Escultura
 
Diagrama de Classe: Relacionamento de Composição
Diagrama de Classe: Relacionamento de ComposiçãoDiagrama de Classe: Relacionamento de Composição
Diagrama de Classe: Relacionamento de Composição
 
Syllabus propedéutica y terapéutica ocular ciclo 2 2015
Syllabus propedéutica y terapéutica ocular ciclo 2 2015Syllabus propedéutica y terapéutica ocular ciclo 2 2015
Syllabus propedéutica y terapéutica ocular ciclo 2 2015
 
Hoja de afiliación
Hoja de afiliaciónHoja de afiliación
Hoja de afiliación
 
Skills Portfolio 2010
Skills Portfolio 2010Skills Portfolio 2010
Skills Portfolio 2010
 
EXPEDIA.ES
EXPEDIA.ESEXPEDIA.ES
EXPEDIA.ES
 

Ähnlich wie "Big Data beyond Apache Hadoop - How to Integrate ALL your Data" - JavaOne 2013

JAZOON'13 - Kai Waehner - Hadoop Integration
JAZOON'13 - Kai Waehner - Hadoop IntegrationJAZOON'13 - Kai Waehner - Hadoop Integration
JAZOON'13 - Kai Waehner - Hadoop Integrationjazoon13
 
Big Data and Enterprise Data - Oracle -1663869
Big Data and Enterprise Data - Oracle -1663869Big Data and Enterprise Data - Oracle -1663869
Big Data and Enterprise Data - Oracle -1663869Edgar Alejandro Villegas
 
Next-Generation BPM - How to create intelligent Business Processes thanks to ...
Next-Generation BPM - How to create intelligent Business Processes thanks to ...Next-Generation BPM - How to create intelligent Business Processes thanks to ...
Next-Generation BPM - How to create intelligent Business Processes thanks to ...Kai Wähner
 
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...Hortonworks
 
SIMPosium presentation_Bardess Qlik
SIMPosium presentation_Bardess QlikSIMPosium presentation_Bardess Qlik
SIMPosium presentation_Bardess QlikBardess Group
 
Revolution in Business Analytics-Zika Virus Example
Revolution in Business Analytics-Zika Virus ExampleRevolution in Business Analytics-Zika Virus Example
Revolution in Business Analytics-Zika Virus ExampleBardess Group
 
Getting Started with Big Data for Business Managers
Getting Started with Big Data for Business ManagersGetting Started with Big Data for Business Managers
Getting Started with Big Data for Business ManagersDatameer
 
2015 02 12 talend hortonworks webinar challenges to hadoop adoption
2015 02 12 talend hortonworks webinar challenges to hadoop adoption2015 02 12 talend hortonworks webinar challenges to hadoop adoption
2015 02 12 talend hortonworks webinar challenges to hadoop adoptionHortonworks
 
Big data introduction, Hadoop in details
Big data introduction, Hadoop in detailsBig data introduction, Hadoop in details
Big data introduction, Hadoop in detailsMahmoud Yassin
 
Connecta Event: Big Query och dataanalys med Google Cloud Platform
Connecta Event: Big Query och dataanalys med Google Cloud PlatformConnecta Event: Big Query och dataanalys med Google Cloud Platform
Connecta Event: Big Query och dataanalys med Google Cloud PlatformConnectaDigital
 
Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which DataWorks Summit
 
Manipulating Data with Talend.
Manipulating Data with Talend.Manipulating Data with Talend.
Manipulating Data with Talend.Edureka!
 
Manipulating data with Talend. Learn how?
Manipulating data with Talend. Learn how?Manipulating data with Talend. Learn how?
Manipulating data with Talend. Learn how?Edureka!
 
BIG Data & Hadoop Applications in Social Media
BIG Data & Hadoop Applications in Social MediaBIG Data & Hadoop Applications in Social Media
BIG Data & Hadoop Applications in Social MediaSkillspeed
 
Architecting for Big Data: Trends, Tips, and Deployment Options
Architecting for Big Data: Trends, Tips, and Deployment OptionsArchitecting for Big Data: Trends, Tips, and Deployment Options
Architecting for Big Data: Trends, Tips, and Deployment OptionsCaserta
 
John Glendenning - Real time data driven services in the Cloud
John Glendenning - Real time data driven services in the CloudJohn Glendenning - Real time data driven services in the Cloud
John Glendenning - Real time data driven services in the CloudWeAreEsynergy
 
Big data4businessusers
Big data4businessusersBig data4businessusers
Big data4businessusersBob Hardaway
 
How to implement Hadoop successfully
How to implement Hadoop successfullyHow to implement Hadoop successfully
How to implement Hadoop successfullyAdir Sharabi
 
BAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, SydneyBAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, SydneySai Paravastu
 

Ähnlich wie "Big Data beyond Apache Hadoop - How to Integrate ALL your Data" - JavaOne 2013 (20)

JAZOON'13 - Kai Waehner - Hadoop Integration
JAZOON'13 - Kai Waehner - Hadoop IntegrationJAZOON'13 - Kai Waehner - Hadoop Integration
JAZOON'13 - Kai Waehner - Hadoop Integration
 
Big Data and Enterprise Data - Oracle -1663869
Big Data and Enterprise Data - Oracle -1663869Big Data and Enterprise Data - Oracle -1663869
Big Data and Enterprise Data - Oracle -1663869
 
Next-Generation BPM - How to create intelligent Business Processes thanks to ...
Next-Generation BPM - How to create intelligent Business Processes thanks to ...Next-Generation BPM - How to create intelligent Business Processes thanks to ...
Next-Generation BPM - How to create intelligent Business Processes thanks to ...
 
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
 
SIMPosium presentation_Bardess Qlik
SIMPosium presentation_Bardess QlikSIMPosium presentation_Bardess Qlik
SIMPosium presentation_Bardess Qlik
 
Revolution in Business Analytics-Zika Virus Example
Revolution in Business Analytics-Zika Virus ExampleRevolution in Business Analytics-Zika Virus Example
Revolution in Business Analytics-Zika Virus Example
 
Getting Started with Big Data for Business Managers
Getting Started with Big Data for Business ManagersGetting Started with Big Data for Business Managers
Getting Started with Big Data for Business Managers
 
2015 02 12 talend hortonworks webinar challenges to hadoop adoption
2015 02 12 talend hortonworks webinar challenges to hadoop adoption2015 02 12 talend hortonworks webinar challenges to hadoop adoption
2015 02 12 talend hortonworks webinar challenges to hadoop adoption
 
Big data introduction, Hadoop in details
Big data introduction, Hadoop in detailsBig data introduction, Hadoop in details
Big data introduction, Hadoop in details
 
Connecta Event: Big Query och dataanalys med Google Cloud Platform
Connecta Event: Big Query och dataanalys med Google Cloud PlatformConnecta Event: Big Query och dataanalys med Google Cloud Platform
Connecta Event: Big Query och dataanalys med Google Cloud Platform
 
Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which
 
Manipulating Data with Talend.
Manipulating Data with Talend.Manipulating Data with Talend.
Manipulating Data with Talend.
 
Manipulating data with Talend. Learn how?
Manipulating data with Talend. Learn how?Manipulating data with Talend. Learn how?
Manipulating data with Talend. Learn how?
 
BIG Data & Hadoop Applications in Social Media
BIG Data & Hadoop Applications in Social MediaBIG Data & Hadoop Applications in Social Media
BIG Data & Hadoop Applications in Social Media
 
Architecting for Big Data: Trends, Tips, and Deployment Options
Architecting for Big Data: Trends, Tips, and Deployment OptionsArchitecting for Big Data: Trends, Tips, and Deployment Options
Architecting for Big Data: Trends, Tips, and Deployment Options
 
John Glendenning - Real time data driven services in the Cloud
John Glendenning - Real time data driven services in the CloudJohn Glendenning - Real time data driven services in the Cloud
John Glendenning - Real time data driven services in the Cloud
 
Big data4businessusers
Big data4businessusersBig data4businessusers
Big data4businessusers
 
Big Data
Big DataBig Data
Big Data
 
How to implement Hadoop successfully
How to implement Hadoop successfullyHow to implement Hadoop successfully
How to implement Hadoop successfully
 
BAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, SydneyBAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, Sydney
 

Mehr von Kai Wähner

Apache Kafka as Data Hub for Crypto, NFT, Metaverse (Beyond the Buzz!)
Apache Kafka as Data Hub for Crypto, NFT, Metaverse (Beyond the Buzz!)Apache Kafka as Data Hub for Crypto, NFT, Metaverse (Beyond the Buzz!)
Apache Kafka as Data Hub for Crypto, NFT, Metaverse (Beyond the Buzz!)Kai Wähner
 
When NOT to use Apache Kafka?
When NOT to use Apache Kafka?When NOT to use Apache Kafka?
When NOT to use Apache Kafka?Kai Wähner
 
Kafka for Live Commerce to Transform the Retail and Shopping Metaverse
Kafka for Live Commerce to Transform the Retail and Shopping MetaverseKafka for Live Commerce to Transform the Retail and Shopping Metaverse
Kafka for Live Commerce to Transform the Retail and Shopping MetaverseKai Wähner
 
The Heart of the Data Mesh Beats in Real-Time with Apache Kafka
The Heart of the Data Mesh Beats in Real-Time with Apache KafkaThe Heart of the Data Mesh Beats in Real-Time with Apache Kafka
The Heart of the Data Mesh Beats in Real-Time with Apache KafkaKai Wähner
 
Apache Kafka vs. Cloud-native iPaaS Integration Platform Middleware
Apache Kafka vs. Cloud-native iPaaS Integration Platform MiddlewareApache Kafka vs. Cloud-native iPaaS Integration Platform Middleware
Apache Kafka vs. Cloud-native iPaaS Integration Platform MiddlewareKai Wähner
 
Data Warehouse vs. Data Lake vs. Data Streaming – Friends, Enemies, Frenemies?
Data Warehouse vs. Data Lake vs. Data Streaming – Friends, Enemies, Frenemies?Data Warehouse vs. Data Lake vs. Data Streaming – Friends, Enemies, Frenemies?
Data Warehouse vs. Data Lake vs. Data Streaming – Friends, Enemies, Frenemies?Kai Wähner
 
Serverless Kafka and Spark in a Multi-Cloud Lakehouse Architecture
Serverless Kafka and Spark in a Multi-Cloud Lakehouse ArchitectureServerless Kafka and Spark in a Multi-Cloud Lakehouse Architecture
Serverless Kafka and Spark in a Multi-Cloud Lakehouse ArchitectureKai Wähner
 
Resilient Real-time Data Streaming across the Edge and Hybrid Cloud with Apac...
Resilient Real-time Data Streaming across the Edge and Hybrid Cloud with Apac...Resilient Real-time Data Streaming across the Edge and Hybrid Cloud with Apac...
Resilient Real-time Data Streaming across the Edge and Hybrid Cloud with Apac...Kai Wähner
 
Data Streaming with Apache Kafka in the Defence and Cybersecurity Industry
Data Streaming with Apache Kafka in the Defence and Cybersecurity IndustryData Streaming with Apache Kafka in the Defence and Cybersecurity Industry
Data Streaming with Apache Kafka in the Defence and Cybersecurity IndustryKai Wähner
 
Apache Kafka in the Healthcare Industry
Apache Kafka in the Healthcare IndustryApache Kafka in the Healthcare Industry
Apache Kafka in the Healthcare IndustryKai Wähner
 
Apache Kafka in the Healthcare Industry
Apache Kafka in the Healthcare IndustryApache Kafka in the Healthcare Industry
Apache Kafka in the Healthcare IndustryKai Wähner
 
Apache Kafka for Real-time Supply Chain in the Food and Retail Industry
Apache Kafka for Real-time Supply Chainin the Food and Retail IndustryApache Kafka for Real-time Supply Chainin the Food and Retail Industry
Apache Kafka for Real-time Supply Chain in the Food and Retail IndustryKai Wähner
 
Kafka for Real-Time Replication between Edge and Hybrid Cloud
Kafka for Real-Time Replication between Edge and Hybrid CloudKafka for Real-Time Replication between Edge and Hybrid Cloud
Kafka for Real-Time Replication between Edge and Hybrid CloudKai Wähner
 
Apache Kafka for Predictive Maintenance in Industrial IoT / Industry 4.0
Apache Kafka for Predictive Maintenance in Industrial IoT / Industry 4.0Apache Kafka for Predictive Maintenance in Industrial IoT / Industry 4.0
Apache Kafka for Predictive Maintenance in Industrial IoT / Industry 4.0Kai Wähner
 
Apache Kafka Landscape for Automotive and Manufacturing
Apache Kafka Landscape for Automotive and ManufacturingApache Kafka Landscape for Automotive and Manufacturing
Apache Kafka Landscape for Automotive and ManufacturingKai Wähner
 
Kappa vs Lambda Architectures and Technology Comparison
Kappa vs Lambda Architectures and Technology ComparisonKappa vs Lambda Architectures and Technology Comparison
Kappa vs Lambda Architectures and Technology ComparisonKai Wähner
 
The Top 5 Apache Kafka Use Cases and Architectures in 2022
The Top 5 Apache Kafka Use Cases and Architectures in 2022The Top 5 Apache Kafka Use Cases and Architectures in 2022
The Top 5 Apache Kafka Use Cases and Architectures in 2022Kai Wähner
 
Event Streaming CTO Roundtable for Cloud-native Kafka Architectures
Event Streaming CTO Roundtable for Cloud-native Kafka ArchitecturesEvent Streaming CTO Roundtable for Cloud-native Kafka Architectures
Event Streaming CTO Roundtable for Cloud-native Kafka ArchitecturesKai Wähner
 
Apache Kafka in the Public Sector (Government, National Security, Citizen Ser...
Apache Kafka in the Public Sector (Government, National Security, Citizen Ser...Apache Kafka in the Public Sector (Government, National Security, Citizen Ser...
Apache Kafka in the Public Sector (Government, National Security, Citizen Ser...Kai Wähner
 
Telco 4.0 - Payment and FinServ Integration for Data in Motion with 5G and Ap...
Telco 4.0 - Payment and FinServ Integration for Data in Motion with 5G and Ap...Telco 4.0 - Payment and FinServ Integration for Data in Motion with 5G and Ap...
Telco 4.0 - Payment and FinServ Integration for Data in Motion with 5G and Ap...Kai Wähner
 

Mehr von Kai Wähner (20)

Apache Kafka as Data Hub for Crypto, NFT, Metaverse (Beyond the Buzz!)
Apache Kafka as Data Hub for Crypto, NFT, Metaverse (Beyond the Buzz!)Apache Kafka as Data Hub for Crypto, NFT, Metaverse (Beyond the Buzz!)
Apache Kafka as Data Hub for Crypto, NFT, Metaverse (Beyond the Buzz!)
 
When NOT to use Apache Kafka?
When NOT to use Apache Kafka?When NOT to use Apache Kafka?
When NOT to use Apache Kafka?
 
Kafka for Live Commerce to Transform the Retail and Shopping Metaverse
Kafka for Live Commerce to Transform the Retail and Shopping MetaverseKafka for Live Commerce to Transform the Retail and Shopping Metaverse
Kafka for Live Commerce to Transform the Retail and Shopping Metaverse
 
The Heart of the Data Mesh Beats in Real-Time with Apache Kafka
The Heart of the Data Mesh Beats in Real-Time with Apache KafkaThe Heart of the Data Mesh Beats in Real-Time with Apache Kafka
The Heart of the Data Mesh Beats in Real-Time with Apache Kafka
 
Apache Kafka vs. Cloud-native iPaaS Integration Platform Middleware
Apache Kafka vs. Cloud-native iPaaS Integration Platform MiddlewareApache Kafka vs. Cloud-native iPaaS Integration Platform Middleware
Apache Kafka vs. Cloud-native iPaaS Integration Platform Middleware
 
Data Warehouse vs. Data Lake vs. Data Streaming – Friends, Enemies, Frenemies?
Data Warehouse vs. Data Lake vs. Data Streaming – Friends, Enemies, Frenemies?Data Warehouse vs. Data Lake vs. Data Streaming – Friends, Enemies, Frenemies?
Data Warehouse vs. Data Lake vs. Data Streaming – Friends, Enemies, Frenemies?
 
Serverless Kafka and Spark in a Multi-Cloud Lakehouse Architecture
Serverless Kafka and Spark in a Multi-Cloud Lakehouse ArchitectureServerless Kafka and Spark in a Multi-Cloud Lakehouse Architecture
Serverless Kafka and Spark in a Multi-Cloud Lakehouse Architecture
 
Resilient Real-time Data Streaming across the Edge and Hybrid Cloud with Apac...
Resilient Real-time Data Streaming across the Edge and Hybrid Cloud with Apac...Resilient Real-time Data Streaming across the Edge and Hybrid Cloud with Apac...
Resilient Real-time Data Streaming across the Edge and Hybrid Cloud with Apac...
 
Data Streaming with Apache Kafka in the Defence and Cybersecurity Industry
Data Streaming with Apache Kafka in the Defence and Cybersecurity IndustryData Streaming with Apache Kafka in the Defence and Cybersecurity Industry
Data Streaming with Apache Kafka in the Defence and Cybersecurity Industry
 
Apache Kafka in the Healthcare Industry
Apache Kafka in the Healthcare IndustryApache Kafka in the Healthcare Industry
Apache Kafka in the Healthcare Industry
 
Apache Kafka in the Healthcare Industry
Apache Kafka in the Healthcare IndustryApache Kafka in the Healthcare Industry
Apache Kafka in the Healthcare Industry
 
Apache Kafka for Real-time Supply Chain in the Food and Retail Industry
Apache Kafka for Real-time Supply Chainin the Food and Retail IndustryApache Kafka for Real-time Supply Chainin the Food and Retail Industry
Apache Kafka for Real-time Supply Chain in the Food and Retail Industry
 
Kafka for Real-Time Replication between Edge and Hybrid Cloud
Kafka for Real-Time Replication between Edge and Hybrid CloudKafka for Real-Time Replication between Edge and Hybrid Cloud
Kafka for Real-Time Replication between Edge and Hybrid Cloud
 
Apache Kafka for Predictive Maintenance in Industrial IoT / Industry 4.0
Apache Kafka for Predictive Maintenance in Industrial IoT / Industry 4.0Apache Kafka for Predictive Maintenance in Industrial IoT / Industry 4.0
Apache Kafka for Predictive Maintenance in Industrial IoT / Industry 4.0
 
Apache Kafka Landscape for Automotive and Manufacturing
Apache Kafka Landscape for Automotive and ManufacturingApache Kafka Landscape for Automotive and Manufacturing
Apache Kafka Landscape for Automotive and Manufacturing
 
Kappa vs Lambda Architectures and Technology Comparison
Kappa vs Lambda Architectures and Technology ComparisonKappa vs Lambda Architectures and Technology Comparison
Kappa vs Lambda Architectures and Technology Comparison
 
The Top 5 Apache Kafka Use Cases and Architectures in 2022
The Top 5 Apache Kafka Use Cases and Architectures in 2022The Top 5 Apache Kafka Use Cases and Architectures in 2022
The Top 5 Apache Kafka Use Cases and Architectures in 2022
 
Event Streaming CTO Roundtable for Cloud-native Kafka Architectures
Event Streaming CTO Roundtable for Cloud-native Kafka ArchitecturesEvent Streaming CTO Roundtable for Cloud-native Kafka Architectures
Event Streaming CTO Roundtable for Cloud-native Kafka Architectures
 
Apache Kafka in the Public Sector (Government, National Security, Citizen Ser...
Apache Kafka in the Public Sector (Government, National Security, Citizen Ser...Apache Kafka in the Public Sector (Government, National Security, Citizen Ser...
Apache Kafka in the Public Sector (Government, National Security, Citizen Ser...
 
Telco 4.0 - Payment and FinServ Integration for Data in Motion with 5G and Ap...
Telco 4.0 - Payment and FinServ Integration for Data in Motion with 5G and Ap...Telco 4.0 - Payment and FinServ Integration for Data in Motion with 5G and Ap...
Telco 4.0 - Payment and FinServ Integration for Data in Motion with 5G and Ap...
 

Kürzlich hochgeladen

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 

Kürzlich hochgeladen (20)

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 

"Big Data beyond Apache Hadoop - How to Integrate ALL your Data" - JavaOne 2013

  • 1. Big data beyond Hadoop – How to integrate ALL your data Kai Wähner kwaehner@talend.com @KaiWaehner www.kai-waehner.de 9/24/2013
  • 2. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner Consulting Developing Coaching Speaking Writing Main Tasks Requirements Engineering Enterprise Architecture Management Business Process Management Architecture and Development of Applications Service-oriented Architecture Integration of Legacy Applications Cloud Computing Big Data Contact Email: kontakt@kai-waehner.de Blog: www.kai-waehner.de/blog Twitter: @KaiWaehner Social Networks: Xing, LinkedIn Kai Wähner
  • 3. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner Key messages You have to care about big data to be competitive in the future! You have to integrate different sources to get most value out of it! Big data integration is no (longer) rocket science!
  • 4. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner • Big data paradigm shift • Challenges of big data • Big data from a technology perspective • Integration with an open source framework • Integration with an open source suite • Custom big data components Agenda
  • 5. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner • Big data paradigm shift • Challenges of big data • Big data from a technology perspective • Integration with an open source framework • Integration with an open source suite • Custom big data components Agenda
  • 6. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner William Edwards Deming (1900 –1993) American statistician, professor, author, lecturer and consultant “If you can't measure it, you can't manage it.” Why should you care about big data?
  • 7. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner  „Silence the HiPPOs“ (highest-paid person‘s opinion)  Being able to interpret unimaginable large data stream, the gut feeling is no longer justified! Why should you care about big data?
  • 8. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner What is big data? The Vs of big data Volume (terabytes, petabytes) Variety (social networks, blog posts, logs, sensors, etc.) Velocity (realtime or near- realtime) Value
  • 9. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner Big Data Integration – Land data in a Big Data cluster – Implement or generate parallel processes Big Data Manipulation – Simplify manipulation, such as sort and filter – Computational expensive functions Big Data Quality & Governance – Identify linkages and duplicates, validate big data – Match component, execute basic quality features Big Data Project Management – Place frameworks around big data projects – Common Repository, scheduling, monitoring Big data tasks to solve - before analysis
  • 10. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner “The advantage of their new system is that they can now look at their data [from their log processing system] in anyway they want: ➜ Nightly MapReduce jobs collect statistics about their mail system such as spam counts by domain, bytes transferred and number of logins. ➜ When they wanted to find out which part of the world their customers logged in from, a quick [ad hoc] MapReduce job was created and they had the answer within a few hours. Not really possible in your typical ETL system.” http://highscalability.com/how-rackspace-now-uses-mapreduce-and-hadoop-query-terabytes-data Use case: Clickstream Analysis
  • 11. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner http://hkotadia.com/archives/5021 Deduce Customer Defections Use case: Risk management
  • 12. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner ➜ With revenue of almost USD 30 billion and a network of 800 locations, Macy's is considered the largest store operator in the USA ➜ Daily price check analysis of its 10,000 articles in less than two hours ➜ Whenever a neighboring competitor anywhere between New York and Los Angeles goes for aggressive price reductions, Macy's follows its example ➜ If there is no market competitor, the prices remain unchanged http://www.t-systems.com/about-t-systems/examples-of-successes-companies-analyze-big-data-in-record-time-l-t-systems/1029702 Use case: Flexible pricing
  • 13. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner ➜ A lot of data must be stored „forever“ ➜ Numbers increase exponentially ➜ Goal: As cheap as possible ➜ Problem: (Fast) queries must still be possible ➜ Solution: Commodity servers and „Hadoop querying“ Global Parcel Service http://archive.org/stream/BigDataImPraxiseinsatz-SzenarienBeispieleEffekte/Big_Data_BITKOM-Leitfaden_Sept.2012#page/n0/mode/2up Storage: Compliance
  • 14. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner • Big data paradigm shift • Challenges of big data • Big data from a technology perspective • Integration with an open source framework • Integration with an open source suite • Custom big data components Agenda
  • 15. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner This is your company Big Data Geek Limited big data experts
  • 16. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner Big Data + Poor Data Quality = Big Problems Data quality
  • 17. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner ➜ Wanna buy a big data solution for your industry? ➜ Maybe a competitor has a big data solution which adds business value? ➜ The competitor will never publish it (rat-race)! Big data tool selection (business perspective)
  • 18. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner Looking for ‚your‘ required big data product? Support your data from scratch? Good luck!  Big data tool selection (technical perspective)
  • 19. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner How to solve these big data challenges?
  • 20. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner  “*Often+ simple models and big data trump more-elaborate [and complex] analytics approaches”  “Often someone coming from outside an industry can spot a better way to use big data than an insider” Erik Brynjolfsson / Lynn Wu http://alfredopassos.tumblr.com/post/32461599327/big-data-the-management-revolution-by-andrew-mcafee Be no expert! Be simple!
  • 21. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner  Look at use cases of others (SMU, but also large companies)  How can you do something similar with your data?  You have different data sources? Use it! Combine it! Play with it! Be creative!
  • 22. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner 1) Do not begin with the data, think about business opportunities 2) Choose the right data (combine different data sources) 3) Use easy tooling http://hbr.org/2012/10/making-advanced-analytics-work-for-you What is your Big Data process? Step 1 Step 2 Step 3
  • 23. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner • Big data paradigm shift • Challenges of big data • Big data from a technology perspective • Integration with an open source framework • Integration with an open source suite • Custom big data components Agenda
  • 24. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner Technology perspective How to process big data?
  • 25. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner The critical flaw in parallel ETL tools is the fact that the data is almost never local to the processing nodes. This means that every time a large job is run, the data has to first be read from the source, split N ways and then delivered to the individual nodes. Worse, if the partition key of the source doesn’t match the partition key of the target, data has to be constantly exchanged among the nodes. In essence, parallel ETL treats the network as if it were a physical I/O subsystem. The network, which is always the slowest part of the process, becomes the weakest link in the performance chain. http://blog.syncsort.com/2012/08/parallel-etl-tools-are-dead How to process big data?
  • 26. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner Slides: http://www.slideshare.net/pavlobaron/100-big-data-0-hadoop-0-java Video: http://www.infoq.com/presentations/Big-Data-Hadoop-Java How to process big data?
  • 27. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner The defacto standard for big data processing How to process big data?
  • 28. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner Even Microsoft (the .NET house) relies on Hadoop since 2011 How to process big data? “A big part of [the company’s strategy+ includes wiring SQL Server 2012 (formerly known by the codename “Denali”) to the Hadoop distributed computing platform, and bringing Hadoop to Windows Server and Azure”
  • 29. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner Apache Hadoop, an open-source software library, is a framework that allows for the distributed processing of large data sets across clusters of commodity hardware using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. What is Hadoop?
  • 30. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner Simple example • Input: (very large) text files with lists of strings, such as: „318, 0043012650999991949032412004...0500001N9+01111+99999999999...“ • We are interested just in some content: year and temperate (marked in red) • The Map Reduce function has to compute the maximum temperature for every year Example from the book “Hadoop: The Definitive Guide, 3rd Edition” Map (Shuffle) Reduce
  • 31. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner How to process big data?
  • 32. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner • Big data paradigm shift • Challenges of big data • Big data from a technology perspective • Integration with an open source framework • Integration with an open source suite • Custom big data components Agenda
  • 33. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner Connectivity Routing Transformation Complexity of Integration Enterprise Service Bus Integration Suite Low High Integration Framework INTEGRATION Tooling Monitoring Support + BUSINESS PROCESS MGT. BIG DATA / MDM REGISTRY / REPOSITORY RULES ENGINE „YOU NAME IT“ + Alternatives for systems integration
  • 34. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner Complexity of Integration Enterprise Service Bus Integration Suite Low High Integration Framework Alternatives for systems integration
  • 35. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner More details about integration frameworks... http://www.kai-waehner.de/blog/2012/12/20/showdown-integration-framework- spring-integration-apache-camel-vs-enterprise-service-bus-esb/
  • 36. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner More details about integration frameworks... ... or you come to my JavaOne session tomorrow and see an updated version of the slides! Wednesday, 11:30 – 12:30 PM CON1934: Which Integration Framework to Choose?
  • 37. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner Enterprise Integration Patterns (EIP) Apache Camel Implements the EIPs
  • 38. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner Enterprise Integration Patterns (EIP)
  • 39. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner Enterprise Integration Patterns (EIP)
  • 40. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner Architecture http://java.dzone.com/articles/apache-camel-integration
  • 41. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner HTTP FTP File XSLT MQ JDBC Akka TCP SMTP RSS Quartz Log LDAP JMS EJB AMQP Atom AWS-S3 Bean-Validation CXF IRC Jetty JMX Lucene Netty RMI SQL Many many more Custom Components Choose your required components
  • 42. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner Choose your favorite DSL XML (not production-ready yet)
  • 43. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner Deploy it wherever you need Standalone OSGi Application Server Web Container Spring Container Cloud
  • 44. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner Enterprise-ready • Open Source • Scalability • Error Handling • Transaction • Monitoring • Tooling • Commercial Support
  • 45. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner Example: Camel integration route
  • 46. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner Hadoop Integration with Apache Camel
  • 47. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner camel-hdfs component // Producer from("ftp://user@myServer?password=secret") .to(“hdfs:///myDirectory/myFile.txt?append=true"); // Consumer from(“hdfs:///myDirectory/myBigDataAnalysis.csv") .to(“file:target/reports/report.csv");
  • 48. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner camel-hbase component
  • 49. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner ➜ A lot of data must be stored „forever“ ➜ Numbers increase exponentially ➜ Goal: As cheap as possible ➜ Problem: (Fast) queries must still be possible ➜ Solution: Commodity servers and „Hadoop querying“ Global Parcel Service http://archive.org/stream/BigDataImPraxiseinsatz-SzenarienBeispieleEffekte/Big_Data_BITKOM-Leitfaden_Sept.2012#page/n0/mode/2up Real World Use Case: Storage: Compliance
  • 50. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner Real world use case: Storage: Compliance Orders (Server 1) Log Files (Server 3) Log Files (Server 100) ETL QueryStorage Payments (Server 2)
  • 51. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner Live demo Apache Camel in action...
  • 52. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner camel-pig? camel-hive? camel-hcatalog? Not available yet (current Camel version: 2.12)  Workarounds: • Use Pig / Hive-Query scripts (via camel-exec component or any scripting language) • Build your own component (more details later ...) • Use Hive-Hbase-Integration and store data in HBase ( „ugly“)
  • 53. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner camel-avro component ... Avro, a data serialization system used in Apache Hadoop. camel-avro component provides a dataformat for Avro, which allows serialization and deserialization of messages using Apache Avro's binary data format. Moreover, it provides support for Apache Avro's RPC, by providing producers and consumers endpoint for using Avro over Netty or HTTP. Camel is not just about connectors ... Camel supports a pluggable DataFormat to allow messages to be marshalled to and unmarshalled from binary or text formats, e.g. CSV, JSON, SOAP, EDI, ZIP, or ...
  • 54. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner • Big data paradigm shift • Challenges of big data • Big data from a technology perspective • Integration with an open source framework • Integration with an open source suite • Custom big data components Agenda
  • 55. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner Connectivity Routing Transformation Complexity of Integration Enterprise Service Bus Integration Suite Low High Integration Framework INTEGRATION Tooling Monitoring Support + BUSINESS PROCESS MGT. BIG DATA / MDM REGISTRY / REPOSITORY RULES ENGINE „YOU NAME IT“ + Alternatives for systems integration
  • 56. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner Complexity of Integration Enterprise Service Bus Integration Suite Low High Integration Framework Alternatives for systems integration
  • 57. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner More details about ESBs and suites... http://www.kai-waehner.de/blog/2013/01/23/spoilt-for-choice- how-to-choose-the-right-enterprise-service-bus-esb/
  • 58. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner Hadoop Integration with Talend Open Studio
  • 59. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner …an open source ecosystem Talend Open Studio for Big Data • Improves efficiency of big data job design with graphic interface • Generates Hadoop code and run transforms inside Hadoop • Native support for HDFS, Pig, Hbase, Hcatalog, Sqoop and Hive • 100% open source under an Apache License • Standards based Pig Vision: Democratize big data
  • 60. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner …an open source ecosystem Talend Platform for Big Data • Builds on Talend Open Studio for Big Data • Adds data quality, advanced scalability and management functions • MapReduce massively parallel data processing • Shared Repository and remote deployment • Data quality and profiling • Data cleansing • Reporting and dashboards • Commercial support, warranty/IP indemnity under a subscription license Pig Vision: Democratize big data
  • 61. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner Talend Open Studio for Big Data
  • 62. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner “The advantage of their new system is that they can now look at their data [from their log processing system] in anyway they want: ➜ Nightly MapReduce jobs collect statistics about their mail system such as spam counts by domain, bytes transferred and number of logins. ➜ When they wanted to find out which part of the world their customers logged in from, a quick [ad hoc] MapReduce job was created and they had the answer within a few hours. Not really possible in your typical ETL system.” http://highscalability.com/how-rackspace-now-uses-mapreduce-and-hadoop-query-terabytes-data Real world Use case: Clickstream Analysis
  • 63. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner Real world use case: Clickstream Analysis Log Files (Server 1) Log Files (Server 2) Log Files (Server 100) ETL
  • 64. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner One of the original uses of Hadoop at Yahoo was to store and process their massive volume of clickstream data. Now enterprises of all types can use Hadoop to refine and analyze clickstream data. They can then answer business questions such as: • What is the most efficient path for a site visitor to research a product, and then buy it? • What products do visitors tend to buy together, and what are they most likely to buy in the future? • Where should I spend resources on fixing or enhancing the user experience on my website? Goal: Data visualization can help you optimize your website and convert more visits into sales and revenue. Potential Uses of Clickstream Data Source: for Clickstream Example: „Hortonworks Hadoop Tutorials - Real Life Use Cases” http://hortonworks.com/blog/hadoop-tutorials-real-life-use-cases-in-the-sandbox
  • 65. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner Example: A semi-structured log file
  • 66. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner Example: ETL Job „... using Talend’s HDFS and Hive Components”
  • 67. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner Example: ETL Job „... using Talend’s Map Reduce Components*” * Not available in open source version of Talend Studio
  • 68. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner „Talend Open Studio for Big Data“ in action... Live demo
  • 69. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner Example: Analysis with Microsoft Excel We can see that the largest number of page hits in Florida were for clothing, followed by shoes. Source: for Clickstream Example: „Hortonworks Hadoop Tutorials - Real Life Use Cases” http://hortonworks.com/blog/hadoop-tutorials-real-life-use-cases-in-the-sandbox
  • 70. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner Example: Analysis with Microsoft Excel The chart shows that the majority of men shopping for clothing on our website are between the ages of 22 and 30. With this information, we can optimize our content for this market segment. Source: for Clickstream Example: „Hortonworks Hadoop Tutorials - Real Life Use Cases” http://hortonworks.com/blog/hadoop-tutorials-real-life-use-cases-in-the-sandbox
  • 71. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner Example: Analysis with Tableau Spoilt for Choice  Use your preferred BI or Analysis tool!
  • 72. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner • Big data paradigm shift • Challenges of big data • Big data from a technology perspective • Integration with an open source framework • Integration with an open source suite • Custom big data components Agenda
  • 73. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner Custom components Easy to realize for all integration alternatives * • Integration Framework • Enterprise Service Bus • Integration Suite * At least for open source solutions
  • 74. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner Custom components You might need a ... • ... Hive component for Camel • ... Impala component for Talend • ... custom component for your internal data format
  • 75. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner Live demo (Example: Apache Camel) Custom components in action...
  • 76. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner Alternative for custom components • SOAP • REST
  • 77. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner Code example: REST API for Salesforce object store // Salesforce Query (SOQL) via REST API from("direct:salesforceViaHttpLIST") .setHeader("X-PrettyPrint", 1) .setHeader("Authorization", accessToken) .setHeader(Exchange.CONTENT_TYPE, "application/json") .to("https://na14.salesforce.com/services/data/v20.0/query?q=SELECT+name+from +Article__c") // Salesforce CREATE via REST API from("direct:salesforceViaHttpCREATE") .setHeader("X-PrettyPrint", 1) .setHeader("Authorization", accessToken) .setHeader(Exchange.CONTENT_TYPE, "application/json“) .to("https://na14.salesforce.com/services/data/v20.0/sobjects/Article__c")
  • 78. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner Did you get the key message?
  • 79. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner Key messages You have to care about big data to be competitive in the future! You have to integrate different sources to get most value out of it! Big data integration is no (longer) rocket science!
  • 80. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner Did you get the key message?
  • 81. Thank you for your attention. Questions? kwaehner@talend.com www.kai-waehner.de LinkedIn / Xing @KaiWaehner