SlideShare ist ein Scribd-Unternehmen logo
1 von 75
Downloaden Sie, um offline zu lesen
A real-time architecture using
Hadoop and Storm.
Speaker

Nathan Bijnens
@nathan_gs

A real-time architecture using Hadoop & Storm. #JaxLondon

2
Our Vision

Volume
Big Data

test

A real-time architecture using Hadoop & Storm. #JaxLondon

3
Big Data

Velocity
test

A real-time architecture using Hadoop & Storm. #JaxLondon

4
Our Vision

Volume

test

Variety
A real-time architecture using Hadoop & Storm. #JaxLondon

5
Computing Trends
Current

Past
Computation (CPUs)
Expensive

Computation Cheap
(Many Core Computers)

Disk Storage Expensive

Disk Storage Cheap
(Cheap Commodity Disks)

DRAM Expensive

DRAM / SSD
Getting Cheap

Coordination Easy
(Latches Don t Often Hit)

Coordination Hard
(Latches Stall a Lot, etc)

Source: Immutability Changes Everything - Pat Helland, RICON2012
A real-time architecture using Hadoop & Storm. #JaxLondon

6
Credits
Nathan Marz
Ex-Backtype & Twitter
Startup in Stealthmode
Storm
Cascalog
ElephantDB

manning.com/marz

A real-time architecture using Hadoop & Storm. #JaxLondon

7
A Data System

A real-time architecture using Hadoop & Storm. #JaxLondon

8
Data is more than Information

Not all information is equal.
Some information is derived from other pieces of
information.

A real-time architecture using Hadoop & Storm. #JaxLondon

9
Data is more than Information

Eventually you will reach the most
This is the information you hold true, simple because it exists.

A real-time architecture using Hadoop & Storm. #JaxLondon

10
Events - Before

Events used to manipulate the
master data.

A real-time architecture using Hadoop & Storm. #JaxLondon

11
Events - After

Today, events are the master
data.

A real-time architecture using Hadoop & Storm. #JaxLondon

12
Data System

everything.

A real-time architecture using Hadoop & Storm. #JaxLondon

13
Events

Data is Immutable

A real-time architecture using Hadoop & Storm. #JaxLondon

14
Events

Data is Time Based

A real-time architecture using Hadoop & Storm. #JaxLondon

15
Capturing change traditionally

Person

Location

Person

Location

Nathan

Antwerp

Nathan

Ghent

Geert

Dendermonde

Geert

Dendermonde

John

Ghent

John

Ghent

A real-time architecture using Hadoop & Storm. #JaxLondon

16
Capturing change

Person

Location

Timestamp

Person

Location

Time

Nathan

Antwerp

2005-01-01

Nathan

Antwerp

2005-01-01

Geert

Dendermonde

2011-10-08

Geert

Dendermonde

2011-10-08

John

Ghent

2010-05-02

John

Ghent

2010-05-02

Nathan

Ghent

2013-02-03

A real-time architecture using Hadoop & Storm. #JaxLondon

17
Query

The data you query is often transformed,
aggregated, ...

A real-time architecture using Hadoop & Storm. #JaxLondon

18
Query

Query = function ( all data )

A real-time architecture using Hadoop & Storm. #JaxLondon

19
Number of people living in each city.

Person

Location

Time

Location

Count

Nathan

Antwerp

2005-01-01

Ghent

2

Geert

Dendermonde

2011-10-08

Dendermonde

1

John

Ghent

2010-05-02

Nathan

Ghent

2013-02-03

A real-time architecture using Hadoop & Storm. #JaxLondon

20
Query

All Data

Query

A real-time architecture using Hadoop & Storm. #JaxLondon

22
Query: Precompute

All Data

Precomputed
View

Query

A real-time architecture using Hadoop & Storm. #JaxLondon

23
Layered Architecture

Batch Layer

Speed Layer

Serving Layer

A real-time architecture using Hadoop & Storm. #JaxLondon

24
Layered Architecture

Query

Cassandra

Incoming Data
Hadoop

Elephant
DB

A real-time architecture using Hadoop & Storm. #JaxLondon

25
Batch Layer

A real-time architecture using Hadoop & Storm. #JaxLondon

26
Batch Layer

Incoming Data
Hadoop

Elephant
DB

A real-time architecture using Hadoop & Storm. #JaxLondon

27
Batch Layer

Unrestrained computation.

A real-time architecture using Hadoop & Storm. #JaxLondon

28
Batch Layer

No need to De-Normalize.

A real-time architecture using Hadoop & Storm. #JaxLondon

29
Batch Layer

Horizontal scalable.

A real-time architecture using Hadoop & Storm. #JaxLondon

30
Batch Layer

High Latency.
matter.

A real-time architecture using Hadoop & Storm. #JaxLondon

31
Batch Layer

Functional computation, based on
immutable inputs, is idempotent.

A real-time architecture using Hadoop & Storm. #JaxLondon

32
Batch Layer

Stores master copy of data set...
append only.

A real-time architecture using Hadoop & Storm. #JaxLondon

33
Batch Layer

A real-time architecture using Hadoop & Storm. #JaxLondon

34
Batch: View generation

View #1

Master Dataset

MapReduce

View #2

View #3

A real-time architecture using Hadoop & Storm. #JaxLondon

35
MapReduce

MAP

1. Take a large data set and divide it into subsets
…

2. Perform the same function on all subsets

REDUCE

DoWork()

DoWork()

DoWork()

…

3. Combine the output from all subsets
…

Output

A real-time architecture using Hadoop & Storm. #JaxLondon

36
Serialization & Schema

Catch errors as quickly as they happen.
Validation on write vs on read.

A real-time architecture using Hadoop & Storm. #JaxLondon

37
Serialization & Schema

CSV is actually a serialization language that is just
poorly defined.

A real-time architecture using Hadoop & Storm. #JaxLondon

38
Serialization & Schema
Use a format with a schema.
-

Thrift
Avro
Protobuffers

A real-time architecture using Hadoop & Storm. #JaxLondon

39
Batch View Database

Read only database.
No random writes required.

A real-time architecture using Hadoop & Storm. #JaxLondon

40
Batch View Database

Every iteration produces the
Views from scratch.

A real-time architecture using Hadoop & Storm. #JaxLondon

41
Batch View Database
ElephantDB
Splout
Voldemort

A real-time architecture using Hadoop & Storm. #JaxLondon

42
Batch Layer

Just a few hours of data.

Data absorbed into Batch Views

Not yet
absorbed.

A real-time architecture using Hadoop & Storm. #JaxLondon

Now

Time

44
Speed Layer

A real-time architecture using Hadoop & Storm. #JaxLondon

45
Overview
Cassandra

Incoming Data
Hadoop

Elephant
DB

A real-time architecture using Hadoop & Storm. #JaxLondon

46
Speed Layer

Stream processing.

A real-time architecture using Hadoop & Storm. #JaxLondon

47
Speed Layer

Continuous computation.

A real-time architecture using Hadoop & Storm. #JaxLondon

48
Speed Layer

Transactional.

A real-time architecture using Hadoop & Storm. #JaxLondon

49
Speed Layer

Storing a limited window of data.
Compensating for the last few hours of data.

A real-time architecture using Hadoop & Storm. #JaxLondon

50
Speed Layer

All the complexity is isolated in the Speed
layer.
-corrected.

A real-time architecture using Hadoop & Storm. #JaxLondon

51
CAP
You have a choice between:
Availability
-

Queries are eventual consistent.

Consistency
-

Queries are consistent.

A real-time architecture using Hadoop & Storm. #JaxLondon

52
Eventual accuracy

Some algorithms are hard to implement
in real time. For those cases we could
estimate the results.

A real-time architecture using Hadoop & Storm. #JaxLondon

53
Speed Layer

Real
Time
View 1

Incoming Data
Real
Time
View 2

A real-time architecture using Hadoop & Storm. #JaxLondon

54
Storm
Message passing.
Distributed processing.
Horizontally scalable.
Incremental algorithms.
Fast.
Data in motion.

A real-time architecture using Hadoop & Storm. #JaxLondon

55
Storm

Nimbus
Supervisor

Supervisor

Executer

Executer

Worker Node

Supervisor
Executer

Executer

Executer

Executer

Executer

Executer

Executer

Worker Node

Zookeeper

Worker Node

A real-time architecture using Hadoop & Storm. #JaxLondon

56
Storm
Tuple

Stream

A real-time architecture using Hadoop & Storm. #JaxLondon

57
Storm
Spout

Bolt

A real-time architecture using Hadoop & Storm. #JaxLondon

58
Storm
Grouping

A real-time architecture using Hadoop & Storm. #JaxLondon

59
Data Ingestion
Kafka
Flume
Scribe
*MQ
Kestrel

A real-time architecture using Hadoop & Storm. #JaxLondon

60
Speed Layer Views
The views are stored in Read & Write database.
-

Cassandra
Hbase
Redis
MySQL
ElasticSearch

Much more complex than a read only view.

A real-time architecture using Hadoop & Storm. #JaxLondon

61
Serving Layer

A real-time architecture using Hadoop & Storm. #JaxLondon

62
Overview

Query

Cassandra

Incoming Data
Hadoop

Elephant
DB

A real-time architecture using Hadoop & Storm. #JaxLondon

63
Serving Layer

Random reads

A real-time architecture using Hadoop & Storm. #JaxLondon

64
Serving Layer

This layer queries the Batch & Real Time
views and merges it.

A real-time architecture using Hadoop & Storm. #JaxLondon

65
Serving Layer

Batch
Views

Merge
Real
Time
Views

A real-time architecture using Hadoop & Storm. #JaxLondon

66
Serving Layer

How to query an Average?

A real-time architecture using Hadoop & Storm. #JaxLondon

67
Overview

A real-time architecture using Hadoop & Storm. #JaxLondon

68
Overview

Query

Cassandra

Incoming Data
Hadoop

Elephant
DB

A real-time architecture using Hadoop & Storm. #JaxLondon

69
Lambda Architecture

A real-time architecture using Hadoop & Storm. #JaxLondon

70
Lambda Architecture

Can discard any view, batch and real time,
and just recreate everything from the master
data.

A real-time architecture using Hadoop & Storm. #JaxLondon

71
Lambda Architecture

Mistakes are corrected via recomputation.
Write bad data? Remove the data & recompute.
Bug in view generation? Just recompute the view.

A real-time architecture using Hadoop & Storm. #JaxLondon

72
Lambda Architecture

Data storage is highly optimized.

A real-time architecture using Hadoop & Storm. #JaxLondon

73
Lambda Architecture

Immutability changes everything.

A real-time architecture using Hadoop & Storm. #JaxLondon

74
Questions?

Questions?
@nathan_gs & #BigDataCon13

A real-time architecture using Hadoop & Storm. #JaxLondon

75
DataCrunchers
We enable companies in envisioning, defining and
implementing a data strategy.
A one-stop-shop for all your Big Data needs.
The first Big Data Consultancy agency in Belgium.

A real-time architecture using Hadoop & Storm. #JaxLondon

76
Thank you

Thank you
@nathan_gs

A real-time architecture using Hadoop & Storm. #JaxLondon

77

Weitere ähnliche Inhalte

Was ist angesagt?

Lambda architecture @ Indix
Lambda architecture @ IndixLambda architecture @ Indix
Lambda architecture @ IndixRajesh Muppalla
 
Implementing the Lambda Architecture efficiently with Apache Spark
Implementing the Lambda Architecture efficiently with Apache SparkImplementing the Lambda Architecture efficiently with Apache Spark
Implementing the Lambda Architecture efficiently with Apache SparkDataWorks Summit
 
Open Source Lambda Architecture with Hadoop, Kafka, Samza and Druid
Open Source Lambda Architecture with Hadoop, Kafka, Samza and DruidOpen Source Lambda Architecture with Hadoop, Kafka, Samza and Druid
Open Source Lambda Architecture with Hadoop, Kafka, Samza and DruidDataWorks Summit
 
Spark Intro @ analytics big data summit
Spark  Intro @ analytics big data summitSpark  Intro @ analytics big data summit
Spark Intro @ analytics big data summitSujee Maniyam
 
Architecting next generation big data platform
Architecting next generation big data platformArchitecting next generation big data platform
Architecting next generation big data platformhadooparchbook
 
A real-time architecture using Hadoop & Storm - Nathan Bijnens & Geert Van La...
A real-time architecture using Hadoop & Storm - Nathan Bijnens & Geert Van La...A real-time architecture using Hadoop & Storm - Nathan Bijnens & Geert Van La...
A real-time architecture using Hadoop & Storm - Nathan Bijnens & Geert Van La...jaxLondonConference
 
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!Tugdual Grall
 
Real-time analytics with Druid at Appsflyer
Real-time analytics with Druid at AppsflyerReal-time analytics with Druid at Appsflyer
Real-time analytics with Druid at AppsflyerMichael Spector
 
Journeys from Kafka to Parquet
Journeys from Kafka to ParquetJourneys from Kafka to Parquet
Journeys from Kafka to ParquetDataWorks Summit
 
Apache Storm vs. Spark Streaming – two Stream Processing Platforms compared
Apache Storm vs. Spark Streaming – two Stream Processing Platforms comparedApache Storm vs. Spark Streaming – two Stream Processing Platforms compared
Apache Storm vs. Spark Streaming – two Stream Processing Platforms comparedGuido Schmutz
 
Case Study: Realtime Analytics with Druid
Case Study: Realtime Analytics with DruidCase Study: Realtime Analytics with Druid
Case Study: Realtime Analytics with DruidSalil Kalia
 
RUNNING A PETASCALE DATA SYSTEM: GOOD, BAD, AND UGLY CHOICES by Alexey Kharlamov
RUNNING A PETASCALE DATA SYSTEM: GOOD, BAD, AND UGLY CHOICES by Alexey KharlamovRUNNING A PETASCALE DATA SYSTEM: GOOD, BAD, AND UGLY CHOICES by Alexey Kharlamov
RUNNING A PETASCALE DATA SYSTEM: GOOD, BAD, AND UGLY CHOICES by Alexey KharlamovBig Data Spain
 
From stream to recommendation using apache beam with cloud pubsub and cloud d...
From stream to recommendation using apache beam with cloud pubsub and cloud d...From stream to recommendation using apache beam with cloud pubsub and cloud d...
From stream to recommendation using apache beam with cloud pubsub and cloud d...Neville Li
 
Online Security Analytics on Large Scale Video Surveillance System by Yu Cao ...
Online Security Analytics on Large Scale Video Surveillance System by Yu Cao ...Online Security Analytics on Large Scale Video Surveillance System by Yu Cao ...
Online Security Analytics on Large Scale Video Surveillance System by Yu Cao ...Spark Summit
 
Analyzing IOT Data in Apache Spark Across Data Centers and Cloud with NetApp ...
Analyzing IOT Data in Apache Spark Across Data Centers and Cloud with NetApp ...Analyzing IOT Data in Apache Spark Across Data Centers and Cloud with NetApp ...
Analyzing IOT Data in Apache Spark Across Data Centers and Cloud with NetApp ...Databricks
 
Analytical DBMS to Apache Spark Auto Migration Framework with Edward Zhang an...
Analytical DBMS to Apache Spark Auto Migration Framework with Edward Zhang an...Analytical DBMS to Apache Spark Auto Migration Framework with Edward Zhang an...
Analytical DBMS to Apache Spark Auto Migration Framework with Edward Zhang an...Databricks
 
Data Analytics with Apache Spark and Cassandra
Data Analytics with Apache Spark and CassandraData Analytics with Apache Spark and Cassandra
Data Analytics with Apache Spark and CassandraGerard Maas
 
Apache Spark 1.6 with Zeppelin - Transformations and Actions on RDDs
Apache Spark 1.6 with Zeppelin - Transformations and Actions on RDDsApache Spark 1.6 with Zeppelin - Transformations and Actions on RDDs
Apache Spark 1.6 with Zeppelin - Transformations and Actions on RDDsTimothy Spann
 

Was ist angesagt? (20)

Lambda architecture @ Indix
Lambda architecture @ IndixLambda architecture @ Indix
Lambda architecture @ Indix
 
Implementing the Lambda Architecture efficiently with Apache Spark
Implementing the Lambda Architecture efficiently with Apache SparkImplementing the Lambda Architecture efficiently with Apache Spark
Implementing the Lambda Architecture efficiently with Apache Spark
 
Open Source Lambda Architecture with Hadoop, Kafka, Samza and Druid
Open Source Lambda Architecture with Hadoop, Kafka, Samza and DruidOpen Source Lambda Architecture with Hadoop, Kafka, Samza and Druid
Open Source Lambda Architecture with Hadoop, Kafka, Samza and Druid
 
Spark Intro @ analytics big data summit
Spark  Intro @ analytics big data summitSpark  Intro @ analytics big data summit
Spark Intro @ analytics big data summit
 
Architecting next generation big data platform
Architecting next generation big data platformArchitecting next generation big data platform
Architecting next generation big data platform
 
A real-time architecture using Hadoop & Storm - Nathan Bijnens & Geert Van La...
A real-time architecture using Hadoop & Storm - Nathan Bijnens & Geert Van La...A real-time architecture using Hadoop & Storm - Nathan Bijnens & Geert Van La...
A real-time architecture using Hadoop & Storm - Nathan Bijnens & Geert Van La...
 
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!
 
Real-time analytics with Druid at Appsflyer
Real-time analytics with Druid at AppsflyerReal-time analytics with Druid at Appsflyer
Real-time analytics with Druid at Appsflyer
 
Journeys from Kafka to Parquet
Journeys from Kafka to ParquetJourneys from Kafka to Parquet
Journeys from Kafka to Parquet
 
Lambda architecture
Lambda architectureLambda architecture
Lambda architecture
 
Apache Storm vs. Spark Streaming – two Stream Processing Platforms compared
Apache Storm vs. Spark Streaming – two Stream Processing Platforms comparedApache Storm vs. Spark Streaming – two Stream Processing Platforms compared
Apache Storm vs. Spark Streaming – two Stream Processing Platforms compared
 
Lambda architecture
Lambda architectureLambda architecture
Lambda architecture
 
Case Study: Realtime Analytics with Druid
Case Study: Realtime Analytics with DruidCase Study: Realtime Analytics with Druid
Case Study: Realtime Analytics with Druid
 
RUNNING A PETASCALE DATA SYSTEM: GOOD, BAD, AND UGLY CHOICES by Alexey Kharlamov
RUNNING A PETASCALE DATA SYSTEM: GOOD, BAD, AND UGLY CHOICES by Alexey KharlamovRUNNING A PETASCALE DATA SYSTEM: GOOD, BAD, AND UGLY CHOICES by Alexey Kharlamov
RUNNING A PETASCALE DATA SYSTEM: GOOD, BAD, AND UGLY CHOICES by Alexey Kharlamov
 
From stream to recommendation using apache beam with cloud pubsub and cloud d...
From stream to recommendation using apache beam with cloud pubsub and cloud d...From stream to recommendation using apache beam with cloud pubsub and cloud d...
From stream to recommendation using apache beam with cloud pubsub and cloud d...
 
Online Security Analytics on Large Scale Video Surveillance System by Yu Cao ...
Online Security Analytics on Large Scale Video Surveillance System by Yu Cao ...Online Security Analytics on Large Scale Video Surveillance System by Yu Cao ...
Online Security Analytics on Large Scale Video Surveillance System by Yu Cao ...
 
Analyzing IOT Data in Apache Spark Across Data Centers and Cloud with NetApp ...
Analyzing IOT Data in Apache Spark Across Data Centers and Cloud with NetApp ...Analyzing IOT Data in Apache Spark Across Data Centers and Cloud with NetApp ...
Analyzing IOT Data in Apache Spark Across Data Centers and Cloud with NetApp ...
 
Analytical DBMS to Apache Spark Auto Migration Framework with Edward Zhang an...
Analytical DBMS to Apache Spark Auto Migration Framework with Edward Zhang an...Analytical DBMS to Apache Spark Auto Migration Framework with Edward Zhang an...
Analytical DBMS to Apache Spark Auto Migration Framework with Edward Zhang an...
 
Data Analytics with Apache Spark and Cassandra
Data Analytics with Apache Spark and CassandraData Analytics with Apache Spark and Cassandra
Data Analytics with Apache Spark and Cassandra
 
Apache Spark 1.6 with Zeppelin - Transformations and Actions on RDDs
Apache Spark 1.6 with Zeppelin - Transformations and Actions on RDDsApache Spark 1.6 with Zeppelin - Transformations and Actions on RDDs
Apache Spark 1.6 with Zeppelin - Transformations and Actions on RDDs
 

Ähnlich wie A real-time architecture using Hadoop and Storm @ JAX London

Developing high frequency indicators using real time tick data on apache supe...
Developing high frequency indicators using real time tick data on apache supe...Developing high frequency indicators using real time tick data on apache supe...
Developing high frequency indicators using real time tick data on apache supe...Zekeriya Besiroglu
 
Observing Intraday Indicators Using Real-Time Tick Data on Apache Superset an...
Observing Intraday Indicators Using Real-Time Tick Data on Apache Superset an...Observing Intraday Indicators Using Real-Time Tick Data on Apache Superset an...
Observing Intraday Indicators Using Real-Time Tick Data on Apache Superset an...DataWorks Summit
 
Big Data analytics with Nginx, Logstash, Redis, Google Bigquery and Neo4j, ja...
Big Data analytics with Nginx, Logstash, Redis, Google Bigquery and Neo4j, ja...Big Data analytics with Nginx, Logstash, Redis, Google Bigquery and Neo4j, ja...
Big Data analytics with Nginx, Logstash, Redis, Google Bigquery and Neo4j, ja...javier ramirez
 
Paris Data Geek - Spark Streaming
Paris Data Geek - Spark Streaming Paris Data Geek - Spark Streaming
Paris Data Geek - Spark Streaming Djamel Zouaoui
 
Big Data, Fast Data @ PayPal (YOW 2018)
Big Data, Fast Data @ PayPal (YOW 2018)Big Data, Fast Data @ PayPal (YOW 2018)
Big Data, Fast Data @ PayPal (YOW 2018)Sid Anand
 
Aleksei Udatšnõi – Crunching thousands of events per second in nearly real ti...
Aleksei Udatšnõi – Crunching thousands of events per second in nearly real ti...Aleksei Udatšnõi – Crunching thousands of events per second in nearly real ti...
Aleksei Udatšnõi – Crunching thousands of events per second in nearly real ti...NoSQLmatters
 
Open Security Operations Center - OpenSOC
Open Security Operations Center - OpenSOCOpen Security Operations Center - OpenSOC
Open Security Operations Center - OpenSOCSheetal Dolas
 
Paris Spark Meetup (Feb2015) ccarbone : SPARK Streaming vs Storm / MLLib / Ne...
Paris Spark Meetup (Feb2015) ccarbone : SPARK Streaming vs Storm / MLLib / Ne...Paris Spark Meetup (Feb2015) ccarbone : SPARK Streaming vs Storm / MLLib / Ne...
Paris Spark Meetup (Feb2015) ccarbone : SPARK Streaming vs Storm / MLLib / Ne...Cedric CARBONE
 
Tugdual Grall - Real World Use Cases: Hadoop and NoSQL in Production
Tugdual Grall - Real World Use Cases: Hadoop and NoSQL in ProductionTugdual Grall - Real World Use Cases: Hadoop and NoSQL in Production
Tugdual Grall - Real World Use Cases: Hadoop and NoSQL in ProductionCodemotion
 
Druid + Kafka: transform your data-in-motion to analytics-in-motion | Gian Me...
Druid + Kafka: transform your data-in-motion to analytics-in-motion | Gian Me...Druid + Kafka: transform your data-in-motion to analytics-in-motion | Gian Me...
Druid + Kafka: transform your data-in-motion to analytics-in-motion | Gian Me...HostedbyConfluent
 
Druid + Kafka: transform your data-in-motion to analytics-in-motion | Gian Me...
Druid + Kafka: transform your data-in-motion to analytics-in-motion | Gian Me...Druid + Kafka: transform your data-in-motion to analytics-in-motion | Gian Me...
Druid + Kafka: transform your data-in-motion to analytics-in-motion | Gian Me...HostedbyConfluent
 
Pivotal Real Time Data Stream Analytics
Pivotal Real Time Data Stream AnalyticsPivotal Real Time Data Stream Analytics
Pivotal Real Time Data Stream Analyticskgshukla
 
ESWC SS 2013 - Tuesday Keynote Steffen Staab: Programming the Semantic Web
ESWC SS 2013 - Tuesday Keynote Steffen Staab: Programming the Semantic WebESWC SS 2013 - Tuesday Keynote Steffen Staab: Programming the Semantic Web
ESWC SS 2013 - Tuesday Keynote Steffen Staab: Programming the Semantic Webeswcsummerschool
 
Staab programming thesemanticweb
Staab programming thesemanticwebStaab programming thesemanticweb
Staab programming thesemanticwebAneta Tu
 
Leonard Austin (Ravelin) - DevOps in a Machine Learning World
Leonard Austin (Ravelin) - DevOps in a Machine Learning WorldLeonard Austin (Ravelin) - DevOps in a Machine Learning World
Leonard Austin (Ravelin) - DevOps in a Machine Learning WorldOutlyer
 
An efficient data mining solution by integrating Spark and Cassandra
An efficient data mining solution by integrating Spark and CassandraAn efficient data mining solution by integrating Spark and Cassandra
An efficient data mining solution by integrating Spark and CassandraStratio
 
Build a Time Series Application with Apache Spark and Apache HBase
Build a Time Series Application with Apache Spark and Apache  HBaseBuild a Time Series Application with Apache Spark and Apache  HBase
Build a Time Series Application with Apache Spark and Apache HBaseCarol McDonald
 
Programming the Semantic Web
Programming the Semantic WebProgramming the Semantic Web
Programming the Semantic WebSteffen Staab
 

Ähnlich wie A real-time architecture using Hadoop and Storm @ JAX London (20)

Developing high frequency indicators using real time tick data on apache supe...
Developing high frequency indicators using real time tick data on apache supe...Developing high frequency indicators using real time tick data on apache supe...
Developing high frequency indicators using real time tick data on apache supe...
 
Observing Intraday Indicators Using Real-Time Tick Data on Apache Superset an...
Observing Intraday Indicators Using Real-Time Tick Data on Apache Superset an...Observing Intraday Indicators Using Real-Time Tick Data on Apache Superset an...
Observing Intraday Indicators Using Real-Time Tick Data on Apache Superset an...
 
Big Data analytics with Nginx, Logstash, Redis, Google Bigquery and Neo4j, ja...
Big Data analytics with Nginx, Logstash, Redis, Google Bigquery and Neo4j, ja...Big Data analytics with Nginx, Logstash, Redis, Google Bigquery and Neo4j, ja...
Big Data analytics with Nginx, Logstash, Redis, Google Bigquery and Neo4j, ja...
 
Paris Data Geek - Spark Streaming
Paris Data Geek - Spark Streaming Paris Data Geek - Spark Streaming
Paris Data Geek - Spark Streaming
 
Big Data, Fast Data @ PayPal (YOW 2018)
Big Data, Fast Data @ PayPal (YOW 2018)Big Data, Fast Data @ PayPal (YOW 2018)
Big Data, Fast Data @ PayPal (YOW 2018)
 
Aleksei Udatšnõi – Crunching thousands of events per second in nearly real ti...
Aleksei Udatšnõi – Crunching thousands of events per second in nearly real ti...Aleksei Udatšnõi – Crunching thousands of events per second in nearly real ti...
Aleksei Udatšnõi – Crunching thousands of events per second in nearly real ti...
 
Open Security Operations Center - OpenSOC
Open Security Operations Center - OpenSOCOpen Security Operations Center - OpenSOC
Open Security Operations Center - OpenSOC
 
Stratio big data spain
Stratio   big data spainStratio   big data spain
Stratio big data spain
 
Paris Spark Meetup (Feb2015) ccarbone : SPARK Streaming vs Storm / MLLib / Ne...
Paris Spark Meetup (Feb2015) ccarbone : SPARK Streaming vs Storm / MLLib / Ne...Paris Spark Meetup (Feb2015) ccarbone : SPARK Streaming vs Storm / MLLib / Ne...
Paris Spark Meetup (Feb2015) ccarbone : SPARK Streaming vs Storm / MLLib / Ne...
 
Big data clustering
Big data clusteringBig data clustering
Big data clustering
 
Tugdual Grall - Real World Use Cases: Hadoop and NoSQL in Production
Tugdual Grall - Real World Use Cases: Hadoop and NoSQL in ProductionTugdual Grall - Real World Use Cases: Hadoop and NoSQL in Production
Tugdual Grall - Real World Use Cases: Hadoop and NoSQL in Production
 
Druid + Kafka: transform your data-in-motion to analytics-in-motion | Gian Me...
Druid + Kafka: transform your data-in-motion to analytics-in-motion | Gian Me...Druid + Kafka: transform your data-in-motion to analytics-in-motion | Gian Me...
Druid + Kafka: transform your data-in-motion to analytics-in-motion | Gian Me...
 
Druid + Kafka: transform your data-in-motion to analytics-in-motion | Gian Me...
Druid + Kafka: transform your data-in-motion to analytics-in-motion | Gian Me...Druid + Kafka: transform your data-in-motion to analytics-in-motion | Gian Me...
Druid + Kafka: transform your data-in-motion to analytics-in-motion | Gian Me...
 
Pivotal Real Time Data Stream Analytics
Pivotal Real Time Data Stream AnalyticsPivotal Real Time Data Stream Analytics
Pivotal Real Time Data Stream Analytics
 
ESWC SS 2013 - Tuesday Keynote Steffen Staab: Programming the Semantic Web
ESWC SS 2013 - Tuesday Keynote Steffen Staab: Programming the Semantic WebESWC SS 2013 - Tuesday Keynote Steffen Staab: Programming the Semantic Web
ESWC SS 2013 - Tuesday Keynote Steffen Staab: Programming the Semantic Web
 
Staab programming thesemanticweb
Staab programming thesemanticwebStaab programming thesemanticweb
Staab programming thesemanticweb
 
Leonard Austin (Ravelin) - DevOps in a Machine Learning World
Leonard Austin (Ravelin) - DevOps in a Machine Learning WorldLeonard Austin (Ravelin) - DevOps in a Machine Learning World
Leonard Austin (Ravelin) - DevOps in a Machine Learning World
 
An efficient data mining solution by integrating Spark and Cassandra
An efficient data mining solution by integrating Spark and CassandraAn efficient data mining solution by integrating Spark and Cassandra
An efficient data mining solution by integrating Spark and Cassandra
 
Build a Time Series Application with Apache Spark and Apache HBase
Build a Time Series Application with Apache Spark and Apache  HBaseBuild a Time Series Application with Apache Spark and Apache  HBase
Build a Time Series Application with Apache Spark and Apache HBase
 
Programming the Semantic Web
Programming the Semantic WebProgramming the Semantic Web
Programming the Semantic Web
 

Mehr von Nathan Bijnens

Data Mesh using Microsoft Fabric
Data Mesh using Microsoft FabricData Mesh using Microsoft Fabric
Data Mesh using Microsoft FabricNathan Bijnens
 
Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)Nathan Bijnens
 
Dataminds - ML in Production
Dataminds - ML in ProductionDataminds - ML in Production
Dataminds - ML in ProductionNathan Bijnens
 
Azure Databricks & Spark @ Techorama 2018
Azure Databricks & Spark @ Techorama 2018Azure Databricks & Spark @ Techorama 2018
Azure Databricks & Spark @ Techorama 2018Nathan Bijnens
 
Big Data Expo '18 - Microsoft AI
Big Data Expo '18 - Microsoft AIBig Data Expo '18 - Microsoft AI
Big Data Expo '18 - Microsoft AINathan Bijnens
 
Spark on Azure, a gentle introduction (nov 2015)
Spark on Azure, a gentle introduction (nov 2015)Spark on Azure, a gentle introduction (nov 2015)
Spark on Azure, a gentle introduction (nov 2015)Nathan Bijnens
 
Cloudera, Azure and Big Data at Cloudera Meetup '17
Cloudera, Azure and Big Data at Cloudera Meetup '17Cloudera, Azure and Big Data at Cloudera Meetup '17
Cloudera, Azure and Big Data at Cloudera Meetup '17Nathan Bijnens
 
Microsoft AI at SAI '17
Microsoft AI at SAI '17Microsoft AI at SAI '17
Microsoft AI at SAI '17Nathan Bijnens
 
Microsoft Advanced Analytics @ Data Science Ghent '16
Microsoft Advanced Analytics @ Data Science Ghent '16Microsoft Advanced Analytics @ Data Science Ghent '16
Microsoft Advanced Analytics @ Data Science Ghent '16Nathan Bijnens
 
A real-time architecture using Hadoop and Storm @ BigData.be
A real-time architecture using Hadoop and Storm @ BigData.beA real-time architecture using Hadoop and Storm @ BigData.be
A real-time architecture using Hadoop and Storm @ BigData.beNathan Bijnens
 
Hadoop Pig: MapReduce the easy way!
Hadoop Pig: MapReduce the easy way!Hadoop Pig: MapReduce the easy way!
Hadoop Pig: MapReduce the easy way!Nathan Bijnens
 

Mehr von Nathan Bijnens (11)

Data Mesh using Microsoft Fabric
Data Mesh using Microsoft FabricData Mesh using Microsoft Fabric
Data Mesh using Microsoft Fabric
 
Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)
 
Dataminds - ML in Production
Dataminds - ML in ProductionDataminds - ML in Production
Dataminds - ML in Production
 
Azure Databricks & Spark @ Techorama 2018
Azure Databricks & Spark @ Techorama 2018Azure Databricks & Spark @ Techorama 2018
Azure Databricks & Spark @ Techorama 2018
 
Big Data Expo '18 - Microsoft AI
Big Data Expo '18 - Microsoft AIBig Data Expo '18 - Microsoft AI
Big Data Expo '18 - Microsoft AI
 
Spark on Azure, a gentle introduction (nov 2015)
Spark on Azure, a gentle introduction (nov 2015)Spark on Azure, a gentle introduction (nov 2015)
Spark on Azure, a gentle introduction (nov 2015)
 
Cloudera, Azure and Big Data at Cloudera Meetup '17
Cloudera, Azure and Big Data at Cloudera Meetup '17Cloudera, Azure and Big Data at Cloudera Meetup '17
Cloudera, Azure and Big Data at Cloudera Meetup '17
 
Microsoft AI at SAI '17
Microsoft AI at SAI '17Microsoft AI at SAI '17
Microsoft AI at SAI '17
 
Microsoft Advanced Analytics @ Data Science Ghent '16
Microsoft Advanced Analytics @ Data Science Ghent '16Microsoft Advanced Analytics @ Data Science Ghent '16
Microsoft Advanced Analytics @ Data Science Ghent '16
 
A real-time architecture using Hadoop and Storm @ BigData.be
A real-time architecture using Hadoop and Storm @ BigData.beA real-time architecture using Hadoop and Storm @ BigData.be
A real-time architecture using Hadoop and Storm @ BigData.be
 
Hadoop Pig: MapReduce the easy way!
Hadoop Pig: MapReduce the easy way!Hadoop Pig: MapReduce the easy way!
Hadoop Pig: MapReduce the easy way!
 

Kürzlich hochgeladen

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 

Kürzlich hochgeladen (20)

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 

A real-time architecture using Hadoop and Storm @ JAX London

Hinweis der Redaktion

  1. 1
  2. 2
  3. How much data doyou have? 44 times as much data in the next decade, 15Zbin 2015 Data silos (erp,crm, …) Customers Trimble (3Tb inhundatabasesysteem) Truvo (wijzigenvaneenindexduurt24u) Traditionele systemen kunnen dit volume niet aan. How many data do you have? Turn 12 terabytes of Tweets created each day into improved product sentiment analysis Convert 350 billion annual meter readings to better predict power consumption 3
  4. Real time Timesensitivedecisiontaking Frauddetection Energyallocation Marketingcampaigns Market transactions Solution: Real-time solutions in combination with batch (hadoop) Nosqlsystems 4
  5. Structured Unstructured 80% is unstructured data, A key drawback of using traditional relational database systems is that they're not good at handling variable data. Aflexibledata model Word, email,foto, text, video, APIs, …? What are your needs regarding variety? The endresult:bringingstructureintounstructureddata Monitor 100’s of live video feeds from surveillance cameras to target points of interest Exploit the 80% data growth in images, video and documents to improve customer satisfaction 5
  6. We can afford to keepImmutableCopiesof lots of data. We NEED immutability to Coordinate with fewer challenges. Semaphores & Locks are the things to avoid: Instruction opportunities lost waiting for a semaphore increase with more cores… 6
  7. The #of followers on Twitter = all follows & unfollows combined. Account balance 9
  8. Data = event In an ever changingworld we found a ‘safe heaven’ for data Everything we do generates events: Pay with Credit Card Commit to Git Click on a webpage Tweet 10
  9. It is easier tostore all data in a cost effective way. Compare to DWH world. 13
  10. Immutability greatly restricts the range of errors that can cause data loss or data corruption. Ex. Only CR, no moreCRUD. Informationmight of course change. Fault Tolerance Data loss Human error, Hardware failure Data Corruption Parallel metfunctioneelprogrammeren. 14
  11. Allows state regeneration.Eg. What was my bank balance on 1 may 2005? 15
  12. Queries as pure functions that take all data as input is the most general formulation. Different functions may look at different portions and aggregate information in different ways. 19
  13. 22
  14. Tooslow; might be petabyte scale Impala/Drill: why not 23
  15. The batch layer can calculate anything (given enough time). 28
  16. The batchlayer stores the data normalized, but in the views it generates, data is often, if not always de normalized. 29
  17. Not vertically 30
  18. 31
  19. It’s OK to croak and restart 32
  20. Is something really immutable when it’s name can change. 33
  21. Doesn’t have to be Hadoop.The importance here is a Distributed FS combined with a processing framework. Spark, 34
  22. 35
  23. Source: PolybasePass2012.pptx http://whyjava.wordpress.com/2011/08/04/how-i-explained-mapreduce-to-my-wife/ 36
  24. http://www.quora.com/Apache-Hadoop/What-is-the-advantage-of-writing-custom-input-format-and-writable-versus-the-TextInputFormat-and-Text-writable/answer/Eric-Sammer?srid=PU&st=ns Value of schemas • Structural integrity • Guarantees on what can and can’t be stored • Prevents corruption Otherwise you’ll detect corruption issues at read-time 37
  25. http://www.quora.com/Apache-Hadoop/What-is-the-advantage-of-writing-custom-input-format-and-writable-versus-the-TextInputFormat-and-Text-writable/answer/Eric-Sammer?srid=PU&st=ns 38
  26. 39
  27. 40
  28. 41
  29. Maarkanopgelostworden, doorbvbES je views opvoorhandtegenereren. 42
  30. 43
  31. 47
  32. 48
  33. In some circumstances. 49
  34. 50
  35. All the complexity of *dealing* with the CAP theorem (like read repair) is isolated in the realtime layer. 51
  36. Consistency (all nodes see the same data at the same time) Availability (a guarantee that every request receives a response about whether it was successful or failed) Partition tolerance (the system continues to operate despite arbitrary message loss or failure of part of the system) http://codahale.com/you-cant-sacrifice-partition-tolerance/ HbasavsCassandra 52
  37. Eg. Unique counts ML 53
  38. 54
  39. Nimbus: Manages the cluster Worker Node: Supervisor: Manages workers; restartsthem if needed Executer Physical JVM process. Execute tasks (those are spread evenly across the workers) Tasks Each in his own Thread. Is the actual Bolt or Spout. Processes the stream. 56
  40. Tuple: Named list of values Dynamiclytyped Stream Sequence of Tuples 57
  41. Spout Source of Streams Sometimesreplayable Bolt Streamtransformations At least 1 input stream 0 - * output streams 58
  42. 60
  43. 61
  44. The serving layer needs to be able to answer any query in a short amount of time. 64
  45. 65
  46. AVG = sum + count;preaggregate, but not everything is possible. 67
  47. Lambda firstnamed by Alonzo Church, he needed a letter for functional abstraction in theory of computation in the 1930s. 70
  48. Hightolerance for human & system errors. 71
  49. http://www.quora.com/Apache-Hadoop/What-is-the-advantage-of-writing-custom-input-format-and-writable-versus-the-TextInputFormat-and-Text-writable/answer/Eric-Sammer?srid=PU&st=ns 72
  50. Data storage layer optimized independently from query resolution layer 73
  51. If you remember one thing about this presentation is: Immutability. 74