SlideShare ist ein Scribd-Unternehmen logo
1 von 61
Building streaming
applications
and evolve them with the business changes
• Bol.com
• Personalization
• Streaming systems
• Change …
• Getting the message across
• Serialization
• Escaping
• Encoding
• Defining
• Evolving
• Conclusions
Agenda
TU-Delft Computer Science (MSc)
Nyenrode Business School (MSc)
Software Developer (USoft)
Research Scientist (NLR)
Infra Architect (NLR)
WebAnalytics Architect (Moniforce)
Lead IT Architect (Bol.com)
Contributor for Apache Hadoop,
Pig, HBase, Storm, Flink, Parquet, …
Apache Avro Committer & PMC
Niels Basjes
nbasjes@bol.com
@nielsbasjes
https://github.com/nielsbasjes
bol.com
> 18 million products for sale
~ 60 million in catalog
> 8.9 million active customers
> 55 million visits per month
> 6000 million
pageviews/year
Season 2017
~16.000.000 presents
Remember
Related
New service in my region
Wishlist
Overall campaign
Based on previous behavior/purchases
Personalization
Batch processing
Data relevance decay
Age of the data
Valueofthedata
Days WeeksMinutes
Stream processing
• Create the best possible
interaction data
• More details on youtube
niels basjes bbuzz 2016
Measuring 2.0
AnonymizedPersonal
Measuring 2.0WebShopBrowser
Rendering
Measuring2.0
Measuring2.0
JavaScript
HTML
Measure
endpoint
Kafka
Sessionizer
Files
Kafka
Files
Kafka
Anonymize
Search Suggest, RECO,
Analytics, …
Fraud prevention
~ 800M events/day
~ 1.5 TiB/day
Using the Measurements
Measuring 2.0
Personlization
Search Suggestion
Recommendations
Fraud prevention
Kafka
Future of services
• Many will do what Measuring 2.0 is doing today.
• Streaming interfaces
• Low latency
• Very large (extreme) volume
• Today ~ 10.000 messages/sec
• Next year > 100.000 messages/sec
How do we build such an interface?
Streaming applications
Data producer Streaming Interface Data consumers
Data consumers
Data consumers
Data consumers
The real
payload is
“byte array”
Kafka producer API
Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("key.serializer",
"org.apache.kafka.common.serialization.StringSerializer");
props.put("value.serializer",
"org.apache.kafka.common.serialization.StringSerializer");
Producer<String, String> producer = new KafkaProducer<>(props);
for (int i = 0; i < 100; i++) {
producer.send(new ProducerRecord<String, String>("my-topic",
Integer.toString(i),
Integer.toString(i)));
}
producer.close();
https://kafka.apache.org/10/javadoc/index.html?org/apache/kafka/clients/producer/KafkaProducer.html
… instruct how to turn the key and
value objects … into bytes.
PubSub producer API
https://cloud.google.com/pubsub/docs/quickstart-client-libraries#pubsub-client-libraries-java
So we need something to
• serialize records into bytes
Records: never a “Just a string”
Person = {
name = ‘Niels Basjes <"Hacker">’
occupation = ‘IT-Architect’
home = {
city = ‘Amstelveen’
}
company = {
name = ‘bol.com’
city = ‘Utrecht’
}
}
JSON?
{
"person": {
"name": "Niels Basjes <"Hacker">",
"occupation": "IT - Architect",
"home": {
"city": "Amstelveen"
},
"company": {
"name": "bol.com",
"city": "Utrecht"
}
}
}
Escaping !
XML?
<person>
<name>Niels Basjes &lt;"Hacker"&gt;</name>
<occupation>IT-Architect</occupation>
<home>
<city>Amstelveen</city>
</home>
<company>
<name>bol.com</name>
<city>Utrecht</city>
</company>
</person>
Escaping !
CSV?
Real production example: The Omniture datafeed:
• <tab> separated record (~635 columns)
• The product_list column is a , separated list of products.
• A product is a ; separated record of fields.
• One of those fields is a | separated list of
• = separated key=value entries
• If it the key is eVar8 then the value is a _ separated pair of product id and title.
;9200000010474211;;;;eVar27=not shown|eVar3=tools| eVar35=1-
1|eVar39=2:BB:P|eVar47=PGT|eVar60=d:P| eVar72=10003747|eVar73=80001655|
eVar8=9200000010474211_Product title|
eVar9=seller 1019104 MaQui
I simplified the previous example…
The real mess we have:
;9200000010474211;;;;eVar27=not
shown|eVar3=tools| eVar35=1-
1|eVar39=2:BB:P|eVar47=PGT|eVar60=d:P|
eVar72=10003747|eVar73=80001655|
eVar8=9200000010474211_Heller borenset - 25-
delig - 1|15|2|25|3|35|4|45|5|55|6|65
|7|75|8|85|9|95|10|105|11|115|12|125|13 mm - niet
voor intensief gebruik|
eVar9=seller 1019104 MaQui
Somebody forgot
the escaping !
Putting a string into a byte[]
• Did you assume US-ASCII ?
• Or the MS-Dos 3.3 codepage 437 !
• Or was it codepage 850?
• EBCDIC
• ASCII
• CP-1252
• ISO 8859-1 (Latin1)
• ISO 8859-5
• Unicode
• UTF-7
• UTF-8
• UTF-16
• UTF-32
• High endian UCS-2
• Low endian UCS-2
• UCS-4
Bol.com
standard:
UTF-8
READ THIS!
Google: “joel spolsky unicode”
Use binary formats!
• Google Protobuf
• Apache Avro
• …
So we need something to
• serialize records into bytes
• make serializing records easy and reliable
Data types
• String
• Integer
• Floating point
• Collection
• List
• Map
• Enumeration
So we need something to
• serialize records into bytes
• make serializing records easy and reliable
• supports data types (and exposes them in the API)
Defining a schema
• Names
• Types
• Optional / mandatory
• Default values
• Nesting
Defining a schema
• CSV
• Too bad: There is no schema.
• Manually write schema code
• Json
• Too bad: There is no schema
• Manually write schema code
• XML
• XSD
Q4 2018: Still draft
https://json-schema.org/
We all love defining an XSD …
<xs:schema attributeFormDefault="unqualified"
elementFormDefault="qualified" xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="person">
<xs:complexType>
<xs:sequence>
<xs:element type="xs:string" name="name"/>
<xs:element type="xs:string" name="occupation"/>
<xs:element name="home">
<xs:complexType>
<xs:sequence>
<xs:element type="xs:string" name="city"/>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="company">
<xs:complexType>
<xs:sequence>
<xs:element type="xs:string" name="name"/>
<xs:element type="xs:string" name="city"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:sequence>
Defining a schema
• CSV
• Too bad: There is no schema.
• Manually write serde
• Json
• Too bad: There is no schema
• Manually write serde
• XML
• XSD
• Protobuf
• IDL
Protobuf
Code generation
Defining a schema
• CSV
• Too bad: There is no schema.
• Manually write serde
• Json
• Too bad: There is no schema
• Manually write serde
• XML
• XSD
• Protobuf
• IDL
• Avro
• Json
• IDL
Apache Avro (Json Schema)
Apache Avro (IDL Schema)
Code generation
Production example
• Import statements
• Consumer comments
So we need something to
• serialize records into bytes
• make serializing records easy and reliable
• supports data types (and exposes them in the API)
• make defining and using records easy
Applications change!
• New business
• New insights
• New wishes
• New scope
• New …
The records will
•  get new fields
•  have obsolete fields
So we need something to
• serialize records into bytes
• make serializing records easy
• supports data types (and exposes them in the API)
• make defining and using records easy
• make defining and distributing new versions easy
Kafka persists messages
• A message is retained until the TTL expired.
• So a topic will contain several message versions!
• With different fields
V1 V2
V3 V4
Rolling upgrades
• During producer upgrade
• New data in multiple versions are created at the same time
• During consumer upgrade
• Multiple ‘expected’ versions in a single consumer
• Multiple consumers, multiple versions
Creating a new version of a schema
• Assume separate jar library with the compiled schema code.
• Scenario 1:
• Producer get’s upgraded to V2 and produces
• Consumer (V1 compiled) reads V2 message.
• Scenario 2:
• Kafka with existing V1 and V2 records
• New consumer (V2 compiled) reads from start.
• Requirement:
• V1 and V2 must be 2-way compatible.
So we need something to
• serialize records into bytes
• make serializing records easy
• supports data types (and exposes them in the API)
• make defining and using records easy
• make defining and distributing new versions easy
• allow evolving to new schema versions easy
Evolving Protobuf
• Fields are tagged with a number
• Evolution is ‘number’ based.
• Schema evolution is ‘easy’ if you can do
that.
• Making 2-way compatible is TOO HARD.
Evolving Avro
• Fields are tagged by NAME
• Evolution is ‘name’ based.
• You can add new fields anywhere
• Making it 2-way compatible is easy
is what we use !
Apache Avro
Simple rules for evolving a schema
1. Field is mandatory and will never be removed
• type field;
2. Field is optional and will never be removed
• union { null, type } field;
3. Field is newly defined and/or can be changed/removed
• type field = “default”;
4. Field is optional and newly defined and/or can be
changed/removed
• union { null, type } field = “default”;
• union { null, type } field = null;
5. Enum
• Avoid enums because these are number based
Avro Message format
• Needs a schema registry
• Fingerprint  JSon Schema
Only the 64bit id of the schema is in the message
We need a Schema Database/Registry
Always check if you did it right.
Producing from Flink into Kafka
Produce from Flink into Kafka
• …
Consume from Kafka into Flink
Consume from Kafka into Flink
Final thoughts
In practice
• AVRO schema evolution works great for streaming
• There is no NEED to upgrade all consumers
• Schema evolution also used to limit the loaded fields
• Avoid needless Garbage Collections
• Applicable to any ‘single record’ storage
• HBase columns.
Thanks
till next bol.com
Niels Basjes
nbasjes@bol.com
Evolving Streaming Applications

Weitere ähnliche Inhalte

Was ist angesagt?

Building a Large Scale Recommendation Engine with Spark and Redis-ML with Sha...
Building a Large Scale Recommendation Engine with Spark and Redis-ML with Sha...Building a Large Scale Recommendation Engine with Spark and Redis-ML with Sha...
Building a Large Scale Recommendation Engine with Spark and Redis-ML with Sha...Databricks
 
Apache Sqoop: Unlocking Hadoop for Your Relational Database
Apache Sqoop: Unlocking Hadoop for Your Relational Database Apache Sqoop: Unlocking Hadoop for Your Relational Database
Apache Sqoop: Unlocking Hadoop for Your Relational Database huguk
 
Migrating ETL Workflow to Apache Spark at Scale in Pinterest
Migrating ETL Workflow to Apache Spark at Scale in PinterestMigrating ETL Workflow to Apache Spark at Scale in Pinterest
Migrating ETL Workflow to Apache Spark at Scale in PinterestDatabricks
 
Fast and Reliable Apache Spark SQL Releases
Fast and Reliable Apache Spark SQL ReleasesFast and Reliable Apache Spark SQL Releases
Fast and Reliable Apache Spark SQL ReleasesDataWorks Summit
 
Being Ready for Apache Kafka - Apache: Big Data Europe 2015
Being Ready for Apache Kafka - Apache: Big Data Europe 2015Being Ready for Apache Kafka - Apache: Big Data Europe 2015
Being Ready for Apache Kafka - Apache: Big Data Europe 2015Michael Noll
 
Structured Streaming for Columnar Data Warehouses with Jack Gudenkauf
Structured Streaming for Columnar Data Warehouses with Jack GudenkaufStructured Streaming for Columnar Data Warehouses with Jack Gudenkauf
Structured Streaming for Columnar Data Warehouses with Jack GudenkaufDatabricks
 
SSR: Structured Streaming for R and Machine Learning
SSR: Structured Streaming for R and Machine LearningSSR: Structured Streaming for R and Machine Learning
SSR: Structured Streaming for R and Machine Learningfelixcss
 
Spark Summit EU talk by Steve Loughran
Spark Summit EU talk by Steve LoughranSpark Summit EU talk by Steve Loughran
Spark Summit EU talk by Steve LoughranSpark Summit
 
Robust and Scalable ETL over Cloud Storage with Apache Spark
Robust and Scalable ETL over Cloud Storage with Apache SparkRobust and Scalable ETL over Cloud Storage with Apache Spark
Robust and Scalable ETL over Cloud Storage with Apache SparkDatabricks
 
Fault Tolerance in Spark: Lessons Learned from Production: Spark Summit East ...
Fault Tolerance in Spark: Lessons Learned from Production: Spark Summit East ...Fault Tolerance in Spark: Lessons Learned from Production: Spark Summit East ...
Fault Tolerance in Spark: Lessons Learned from Production: Spark Summit East ...Spark Summit
 
Clipper: A Low-Latency Online Prediction Serving System
Clipper: A Low-Latency Online Prediction Serving SystemClipper: A Low-Latency Online Prediction Serving System
Clipper: A Low-Latency Online Prediction Serving SystemDatabricks
 
Apache Spark Streaming: Architecture and Fault Tolerance
Apache Spark Streaming: Architecture and Fault ToleranceApache Spark Streaming: Architecture and Fault Tolerance
Apache Spark Streaming: Architecture and Fault ToleranceSachin Aggarwal
 
Akka 2.4 plus new commercial features in Typesafe Reactive Platform
Akka 2.4 plus new commercial features in Typesafe Reactive PlatformAkka 2.4 plus new commercial features in Typesafe Reactive Platform
Akka 2.4 plus new commercial features in Typesafe Reactive PlatformLegacy Typesafe (now Lightbend)
 
From Query Plan to Query Performance: Supercharging your Apache Spark Queries...
From Query Plan to Query Performance: Supercharging your Apache Spark Queries...From Query Plan to Query Performance: Supercharging your Apache Spark Queries...
From Query Plan to Query Performance: Supercharging your Apache Spark Queries...Databricks
 
Building a High-Performance Database with Scala, Akka, and Spark
Building a High-Performance Database with Scala, Akka, and SparkBuilding a High-Performance Database with Scala, Akka, and Spark
Building a High-Performance Database with Scala, Akka, and SparkEvan Chan
 
Spark Summit EU talk by Jim Dowling
Spark Summit EU talk by Jim DowlingSpark Summit EU talk by Jim Dowling
Spark Summit EU talk by Jim DowlingSpark Summit
 
Apache Spark Performance is too hard. Let's make it easier
Apache Spark Performance is too hard. Let's make it easierApache Spark Performance is too hard. Let's make it easier
Apache Spark Performance is too hard. Let's make it easierDatabricks
 
Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016
Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016
Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016Gyula Fóra
 
Sqoop on Spark for Data Ingestion
Sqoop on Spark for Data IngestionSqoop on Spark for Data Ingestion
Sqoop on Spark for Data IngestionDataWorks Summit
 
Mobility insights at Swisscom - Understanding collective mobility in Switzerland
Mobility insights at Swisscom - Understanding collective mobility in SwitzerlandMobility insights at Swisscom - Understanding collective mobility in Switzerland
Mobility insights at Swisscom - Understanding collective mobility in SwitzerlandFrançois Garillot
 

Was ist angesagt? (20)

Building a Large Scale Recommendation Engine with Spark and Redis-ML with Sha...
Building a Large Scale Recommendation Engine with Spark and Redis-ML with Sha...Building a Large Scale Recommendation Engine with Spark and Redis-ML with Sha...
Building a Large Scale Recommendation Engine with Spark and Redis-ML with Sha...
 
Apache Sqoop: Unlocking Hadoop for Your Relational Database
Apache Sqoop: Unlocking Hadoop for Your Relational Database Apache Sqoop: Unlocking Hadoop for Your Relational Database
Apache Sqoop: Unlocking Hadoop for Your Relational Database
 
Migrating ETL Workflow to Apache Spark at Scale in Pinterest
Migrating ETL Workflow to Apache Spark at Scale in PinterestMigrating ETL Workflow to Apache Spark at Scale in Pinterest
Migrating ETL Workflow to Apache Spark at Scale in Pinterest
 
Fast and Reliable Apache Spark SQL Releases
Fast and Reliable Apache Spark SQL ReleasesFast and Reliable Apache Spark SQL Releases
Fast and Reliable Apache Spark SQL Releases
 
Being Ready for Apache Kafka - Apache: Big Data Europe 2015
Being Ready for Apache Kafka - Apache: Big Data Europe 2015Being Ready for Apache Kafka - Apache: Big Data Europe 2015
Being Ready for Apache Kafka - Apache: Big Data Europe 2015
 
Structured Streaming for Columnar Data Warehouses with Jack Gudenkauf
Structured Streaming for Columnar Data Warehouses with Jack GudenkaufStructured Streaming for Columnar Data Warehouses with Jack Gudenkauf
Structured Streaming for Columnar Data Warehouses with Jack Gudenkauf
 
SSR: Structured Streaming for R and Machine Learning
SSR: Structured Streaming for R and Machine LearningSSR: Structured Streaming for R and Machine Learning
SSR: Structured Streaming for R and Machine Learning
 
Spark Summit EU talk by Steve Loughran
Spark Summit EU talk by Steve LoughranSpark Summit EU talk by Steve Loughran
Spark Summit EU talk by Steve Loughran
 
Robust and Scalable ETL over Cloud Storage with Apache Spark
Robust and Scalable ETL over Cloud Storage with Apache SparkRobust and Scalable ETL over Cloud Storage with Apache Spark
Robust and Scalable ETL over Cloud Storage with Apache Spark
 
Fault Tolerance in Spark: Lessons Learned from Production: Spark Summit East ...
Fault Tolerance in Spark: Lessons Learned from Production: Spark Summit East ...Fault Tolerance in Spark: Lessons Learned from Production: Spark Summit East ...
Fault Tolerance in Spark: Lessons Learned from Production: Spark Summit East ...
 
Clipper: A Low-Latency Online Prediction Serving System
Clipper: A Low-Latency Online Prediction Serving SystemClipper: A Low-Latency Online Prediction Serving System
Clipper: A Low-Latency Online Prediction Serving System
 
Apache Spark Streaming: Architecture and Fault Tolerance
Apache Spark Streaming: Architecture and Fault ToleranceApache Spark Streaming: Architecture and Fault Tolerance
Apache Spark Streaming: Architecture and Fault Tolerance
 
Akka 2.4 plus new commercial features in Typesafe Reactive Platform
Akka 2.4 plus new commercial features in Typesafe Reactive PlatformAkka 2.4 plus new commercial features in Typesafe Reactive Platform
Akka 2.4 plus new commercial features in Typesafe Reactive Platform
 
From Query Plan to Query Performance: Supercharging your Apache Spark Queries...
From Query Plan to Query Performance: Supercharging your Apache Spark Queries...From Query Plan to Query Performance: Supercharging your Apache Spark Queries...
From Query Plan to Query Performance: Supercharging your Apache Spark Queries...
 
Building a High-Performance Database with Scala, Akka, and Spark
Building a High-Performance Database with Scala, Akka, and SparkBuilding a High-Performance Database with Scala, Akka, and Spark
Building a High-Performance Database with Scala, Akka, and Spark
 
Spark Summit EU talk by Jim Dowling
Spark Summit EU talk by Jim DowlingSpark Summit EU talk by Jim Dowling
Spark Summit EU talk by Jim Dowling
 
Apache Spark Performance is too hard. Let's make it easier
Apache Spark Performance is too hard. Let's make it easierApache Spark Performance is too hard. Let's make it easier
Apache Spark Performance is too hard. Let's make it easier
 
Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016
Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016
Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016
 
Sqoop on Spark for Data Ingestion
Sqoop on Spark for Data IngestionSqoop on Spark for Data Ingestion
Sqoop on Spark for Data Ingestion
 
Mobility insights at Swisscom - Understanding collective mobility in Switzerland
Mobility insights at Swisscom - Understanding collective mobility in SwitzerlandMobility insights at Swisscom - Understanding collective mobility in Switzerland
Mobility insights at Swisscom - Understanding collective mobility in Switzerland
 

Ähnlich wie Evolving Streaming Applications

U-SQL - Azure Data Lake Analytics for Developers
U-SQL - Azure Data Lake Analytics for DevelopersU-SQL - Azure Data Lake Analytics for Developers
U-SQL - Azure Data Lake Analytics for DevelopersMichael Rys
 
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)Michael Rys
 
Time Series Databases for IoT (On-premises and Azure)
Time Series Databases for IoT (On-premises and Azure)Time Series Databases for IoT (On-premises and Azure)
Time Series Databases for IoT (On-premises and Azure)Ivo Andreev
 
Build a modern data platform.pptx
Build a modern data platform.pptxBuild a modern data platform.pptx
Build a modern data platform.pptxIke Ellis
 
A machine learning and data science pipeline for real companies
A machine learning and data science pipeline for real companiesA machine learning and data science pipeline for real companies
A machine learning and data science pipeline for real companiesDataWorks Summit
 
REST - Why, When and How? at AMIS25
REST - Why, When and How? at AMIS25REST - Why, When and How? at AMIS25
REST - Why, When and How? at AMIS25Jon Petter Hjulstad
 
Taming the Data Science Monster with A New ‘Sword’ – U-SQL
Taming the Data Science Monster with A New ‘Sword’ – U-SQLTaming the Data Science Monster with A New ‘Sword’ – U-SQL
Taming the Data Science Monster with A New ‘Sword’ – U-SQLMichael Rys
 
Microservices and the Art of Taming the Dependency Hell Monster
Microservices and the Art of Taming the Dependency Hell MonsterMicroservices and the Art of Taming the Dependency Hell Monster
Microservices and the Art of Taming the Dependency Hell MonsterC4Media
 
Building better SQL Server Databases
Building better SQL Server DatabasesBuilding better SQL Server Databases
Building better SQL Server DatabasesColdFusionConference
 
The openCypher Project - An Open Graph Query Language
The openCypher Project - An Open Graph Query LanguageThe openCypher Project - An Open Graph Query Language
The openCypher Project - An Open Graph Query LanguageNeo4j
 
Introduction to SoapUI day 1
Introduction to SoapUI day 1Introduction to SoapUI day 1
Introduction to SoapUI day 1Qualitest
 
Soap UI - Getting started
Soap UI - Getting startedSoap UI - Getting started
Soap UI - Getting startedQualitest
 
Enterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
Enterprise Data World 2018 - Building Cloud Self-Service Analytical SolutionEnterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
Enterprise Data World 2018 - Building Cloud Self-Service Analytical SolutionDmitry Anoshin
 
EDB NoSQL German Webinar 2015
EDB NoSQL German Webinar 2015EDB NoSQL German Webinar 2015
EDB NoSQL German Webinar 2015EDB
 
Dan Hotka's Top 10 Oracle 12c New Features
Dan Hotka's Top 10 Oracle 12c New FeaturesDan Hotka's Top 10 Oracle 12c New Features
Dan Hotka's Top 10 Oracle 12c New FeaturesEmbarcadero Technologies
 
Adf and ala design c sharp corner toronto chapter feb 2019 meetup nik shahriar
Adf and ala design c sharp corner toronto chapter feb 2019 meetup nik shahriarAdf and ala design c sharp corner toronto chapter feb 2019 meetup nik shahriar
Adf and ala design c sharp corner toronto chapter feb 2019 meetup nik shahriarNilesh Shah
 
Dive Into Azure Data Lake - PASS 2017
Dive Into Azure Data Lake - PASS 2017Dive Into Azure Data Lake - PASS 2017
Dive Into Azure Data Lake - PASS 2017Ike Ellis
 
Walkthrough Neo4j 1.9 & 2.0
Walkthrough Neo4j 1.9 & 2.0Walkthrough Neo4j 1.9 & 2.0
Walkthrough Neo4j 1.9 & 2.0Neo4j
 

Ähnlich wie Evolving Streaming Applications (20)

U-SQL - Azure Data Lake Analytics for Developers
U-SQL - Azure Data Lake Analytics for DevelopersU-SQL - Azure Data Lake Analytics for Developers
U-SQL - Azure Data Lake Analytics for Developers
 
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)
 
Time Series Databases for IoT (On-premises and Azure)
Time Series Databases for IoT (On-premises and Azure)Time Series Databases for IoT (On-premises and Azure)
Time Series Databases for IoT (On-premises and Azure)
 
Build a modern data platform.pptx
Build a modern data platform.pptxBuild a modern data platform.pptx
Build a modern data platform.pptx
 
A machine learning and data science pipeline for real companies
A machine learning and data science pipeline for real companiesA machine learning and data science pipeline for real companies
A machine learning and data science pipeline for real companies
 
REST - Why, When and How? at AMIS25
REST - Why, When and How? at AMIS25REST - Why, When and How? at AMIS25
REST - Why, When and How? at AMIS25
 
Taming the Data Science Monster with A New ‘Sword’ – U-SQL
Taming the Data Science Monster with A New ‘Sword’ – U-SQLTaming the Data Science Monster with A New ‘Sword’ – U-SQL
Taming the Data Science Monster with A New ‘Sword’ – U-SQL
 
Microservices and the Art of Taming the Dependency Hell Monster
Microservices and the Art of Taming the Dependency Hell MonsterMicroservices and the Art of Taming the Dependency Hell Monster
Microservices and the Art of Taming the Dependency Hell Monster
 
Building better SQL Server Databases
Building better SQL Server DatabasesBuilding better SQL Server Databases
Building better SQL Server Databases
 
The openCypher Project - An Open Graph Query Language
The openCypher Project - An Open Graph Query LanguageThe openCypher Project - An Open Graph Query Language
The openCypher Project - An Open Graph Query Language
 
Apache HAWQ Architecture
Apache HAWQ ArchitectureApache HAWQ Architecture
Apache HAWQ Architecture
 
No sql way_in_pg
No sql way_in_pgNo sql way_in_pg
No sql way_in_pg
 
Introduction to SoapUI day 1
Introduction to SoapUI day 1Introduction to SoapUI day 1
Introduction to SoapUI day 1
 
Soap UI - Getting started
Soap UI - Getting startedSoap UI - Getting started
Soap UI - Getting started
 
Enterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
Enterprise Data World 2018 - Building Cloud Self-Service Analytical SolutionEnterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
Enterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
 
EDB NoSQL German Webinar 2015
EDB NoSQL German Webinar 2015EDB NoSQL German Webinar 2015
EDB NoSQL German Webinar 2015
 
Dan Hotka's Top 10 Oracle 12c New Features
Dan Hotka's Top 10 Oracle 12c New FeaturesDan Hotka's Top 10 Oracle 12c New Features
Dan Hotka's Top 10 Oracle 12c New Features
 
Adf and ala design c sharp corner toronto chapter feb 2019 meetup nik shahriar
Adf and ala design c sharp corner toronto chapter feb 2019 meetup nik shahriarAdf and ala design c sharp corner toronto chapter feb 2019 meetup nik shahriar
Adf and ala design c sharp corner toronto chapter feb 2019 meetup nik shahriar
 
Dive Into Azure Data Lake - PASS 2017
Dive Into Azure Data Lake - PASS 2017Dive Into Azure Data Lake - PASS 2017
Dive Into Azure Data Lake - PASS 2017
 
Walkthrough Neo4j 1.9 & 2.0
Walkthrough Neo4j 1.9 & 2.0Walkthrough Neo4j 1.9 & 2.0
Walkthrough Neo4j 1.9 & 2.0
 

Mehr von DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

Mehr von DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Kürzlich hochgeladen

Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 

Kürzlich hochgeladen (20)

Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 

Evolving Streaming Applications

  • 1. Building streaming applications and evolve them with the business changes
  • 2. • Bol.com • Personalization • Streaming systems • Change … • Getting the message across • Serialization • Escaping • Encoding • Defining • Evolving • Conclusions Agenda
  • 3. TU-Delft Computer Science (MSc) Nyenrode Business School (MSc) Software Developer (USoft) Research Scientist (NLR) Infra Architect (NLR) WebAnalytics Architect (Moniforce) Lead IT Architect (Bol.com) Contributor for Apache Hadoop, Pig, HBase, Storm, Flink, Parquet, … Apache Avro Committer & PMC Niels Basjes nbasjes@bol.com @nielsbasjes https://github.com/nielsbasjes
  • 5.
  • 6. > 18 million products for sale ~ 60 million in catalog > 8.9 million active customers > 55 million visits per month > 6000 million pageviews/year Season 2017 ~16.000.000 presents
  • 7. Remember Related New service in my region Wishlist Overall campaign Based on previous behavior/purchases Personalization
  • 9. Data relevance decay Age of the data Valueofthedata Days WeeksMinutes
  • 11. • Create the best possible interaction data • More details on youtube niels basjes bbuzz 2016 Measuring 2.0
  • 13. Using the Measurements Measuring 2.0 Personlization Search Suggestion Recommendations Fraud prevention Kafka
  • 14. Future of services • Many will do what Measuring 2.0 is doing today. • Streaming interfaces • Low latency • Very large (extreme) volume • Today ~ 10.000 messages/sec • Next year > 100.000 messages/sec
  • 15. How do we build such an interface?
  • 16. Streaming applications Data producer Streaming Interface Data consumers Data consumers Data consumers Data consumers The real payload is “byte array”
  • 17. Kafka producer API Properties props = new Properties(); props.put("bootstrap.servers", "localhost:9092"); props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer"); props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer"); Producer<String, String> producer = new KafkaProducer<>(props); for (int i = 0; i < 100; i++) { producer.send(new ProducerRecord<String, String>("my-topic", Integer.toString(i), Integer.toString(i))); } producer.close(); https://kafka.apache.org/10/javadoc/index.html?org/apache/kafka/clients/producer/KafkaProducer.html … instruct how to turn the key and value objects … into bytes.
  • 19. So we need something to • serialize records into bytes
  • 20. Records: never a “Just a string” Person = { name = ‘Niels Basjes <"Hacker">’ occupation = ‘IT-Architect’ home = { city = ‘Amstelveen’ } company = { name = ‘bol.com’ city = ‘Utrecht’ } }
  • 21. JSON? { "person": { "name": "Niels Basjes <"Hacker">", "occupation": "IT - Architect", "home": { "city": "Amstelveen" }, "company": { "name": "bol.com", "city": "Utrecht" } } } Escaping !
  • 23. CSV? Real production example: The Omniture datafeed: • <tab> separated record (~635 columns) • The product_list column is a , separated list of products. • A product is a ; separated record of fields. • One of those fields is a | separated list of • = separated key=value entries • If it the key is eVar8 then the value is a _ separated pair of product id and title. ;9200000010474211;;;;eVar27=not shown|eVar3=tools| eVar35=1- 1|eVar39=2:BB:P|eVar47=PGT|eVar60=d:P| eVar72=10003747|eVar73=80001655| eVar8=9200000010474211_Product title| eVar9=seller 1019104 MaQui
  • 24. I simplified the previous example…
  • 25. The real mess we have: ;9200000010474211;;;;eVar27=not shown|eVar3=tools| eVar35=1- 1|eVar39=2:BB:P|eVar47=PGT|eVar60=d:P| eVar72=10003747|eVar73=80001655| eVar8=9200000010474211_Heller borenset - 25- delig - 1|15|2|25|3|35|4|45|5|55|6|65 |7|75|8|85|9|95|10|105|11|115|12|125|13 mm - niet voor intensief gebruik| eVar9=seller 1019104 MaQui Somebody forgot the escaping !
  • 26. Putting a string into a byte[] • Did you assume US-ASCII ? • Or the MS-Dos 3.3 codepage 437 ! • Or was it codepage 850? • EBCDIC • ASCII • CP-1252 • ISO 8859-1 (Latin1) • ISO 8859-5 • Unicode • UTF-7 • UTF-8 • UTF-16 • UTF-32 • High endian UCS-2 • Low endian UCS-2 • UCS-4 Bol.com standard: UTF-8
  • 27. READ THIS! Google: “joel spolsky unicode”
  • 28. Use binary formats! • Google Protobuf • Apache Avro • …
  • 29. So we need something to • serialize records into bytes • make serializing records easy and reliable
  • 30. Data types • String • Integer • Floating point • Collection • List • Map • Enumeration
  • 31. So we need something to • serialize records into bytes • make serializing records easy and reliable • supports data types (and exposes them in the API)
  • 32. Defining a schema • Names • Types • Optional / mandatory • Default values • Nesting
  • 33. Defining a schema • CSV • Too bad: There is no schema. • Manually write schema code • Json • Too bad: There is no schema • Manually write schema code • XML • XSD Q4 2018: Still draft https://json-schema.org/
  • 34. We all love defining an XSD … <xs:schema attributeFormDefault="unqualified" elementFormDefault="qualified" xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="person"> <xs:complexType> <xs:sequence> <xs:element type="xs:string" name="name"/> <xs:element type="xs:string" name="occupation"/> <xs:element name="home"> <xs:complexType> <xs:sequence> <xs:element type="xs:string" name="city"/> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="company"> <xs:complexType> <xs:sequence> <xs:element type="xs:string" name="name"/> <xs:element type="xs:string" name="city"/> </xs:sequence> </xs:complexType> </xs:element> </xs:sequence>
  • 35. Defining a schema • CSV • Too bad: There is no schema. • Manually write serde • Json • Too bad: There is no schema • Manually write serde • XML • XSD • Protobuf • IDL
  • 37. Defining a schema • CSV • Too bad: There is no schema. • Manually write serde • Json • Too bad: There is no schema • Manually write serde • XML • XSD • Protobuf • IDL • Avro • Json • IDL
  • 38. Apache Avro (Json Schema)
  • 39. Apache Avro (IDL Schema) Code generation
  • 40. Production example • Import statements • Consumer comments
  • 41. So we need something to • serialize records into bytes • make serializing records easy and reliable • supports data types (and exposes them in the API) • make defining and using records easy
  • 42. Applications change! • New business • New insights • New wishes • New scope • New … The records will •  get new fields •  have obsolete fields
  • 43. So we need something to • serialize records into bytes • make serializing records easy • supports data types (and exposes them in the API) • make defining and using records easy • make defining and distributing new versions easy
  • 44. Kafka persists messages • A message is retained until the TTL expired. • So a topic will contain several message versions! • With different fields V1 V2 V3 V4
  • 45. Rolling upgrades • During producer upgrade • New data in multiple versions are created at the same time • During consumer upgrade • Multiple ‘expected’ versions in a single consumer • Multiple consumers, multiple versions
  • 46. Creating a new version of a schema • Assume separate jar library with the compiled schema code. • Scenario 1: • Producer get’s upgraded to V2 and produces • Consumer (V1 compiled) reads V2 message. • Scenario 2: • Kafka with existing V1 and V2 records • New consumer (V2 compiled) reads from start. • Requirement: • V1 and V2 must be 2-way compatible.
  • 47. So we need something to • serialize records into bytes • make serializing records easy • supports data types (and exposes them in the API) • make defining and using records easy • make defining and distributing new versions easy • allow evolving to new schema versions easy
  • 48. Evolving Protobuf • Fields are tagged with a number • Evolution is ‘number’ based. • Schema evolution is ‘easy’ if you can do that. • Making 2-way compatible is TOO HARD.
  • 49. Evolving Avro • Fields are tagged by NAME • Evolution is ‘name’ based. • You can add new fields anywhere • Making it 2-way compatible is easy
  • 50. is what we use ! Apache Avro
  • 51. Simple rules for evolving a schema 1. Field is mandatory and will never be removed • type field; 2. Field is optional and will never be removed • union { null, type } field; 3. Field is newly defined and/or can be changed/removed • type field = “default”; 4. Field is optional and newly defined and/or can be changed/removed • union { null, type } field = “default”; • union { null, type } field = null; 5. Enum • Avoid enums because these are number based
  • 52. Avro Message format • Needs a schema registry • Fingerprint  JSon Schema Only the 64bit id of the schema is in the message We need a Schema Database/Registry
  • 53. Always check if you did it right.
  • 54. Producing from Flink into Kafka
  • 55. Produce from Flink into Kafka • …
  • 56. Consume from Kafka into Flink
  • 57. Consume from Kafka into Flink
  • 59. In practice • AVRO schema evolution works great for streaming • There is no NEED to upgrade all consumers • Schema evolution also used to limit the loaded fields • Avoid needless Garbage Collections • Applicable to any ‘single record’ storage • HBase columns.
  • 60. Thanks till next bol.com Niels Basjes nbasjes@bol.com

Hinweis der Redaktion

  1. To aid the developers and acceptance testers in validating if the measurements have been done correctly we intend to create a plugin or overlay that when in the bol.com office (or connected via VPN) you can validate which measurements have actually been recorded for the page you are looking at right now (i.e ONLY the page YOU are looking at).