SlideShare ist ein Scribd-Unternehmen logo
1 von 24
Streaming and
Social Media
Joe Olson
Senior Manager, Big Data Analytics
Apache Road Show Chicago - May 2019
Agenda
United and the Airline Industry
How Streaming Model Presents
Opportunity
Apache Flink
4 Q & A
2
About United Airlines…..
 1,348 aircraft (779 mainline, 569 regional) with 250+ on order (supply chain)
 158M passengers in 2018
(public facing web site, mobile app, time / geospatial based inventory, loyalty program, surveys, ancillary sales)
 4900 daily departures (scheduling, operations, weather, route planning)
 355 airports served, in 48 countries (baggage claim, check-ins)
 88,000 employees worldwide (scheduling, pay)
 Constantly in motion! Future (and past) always changing.
 A data scientist / data engineer dream.
Source: https://hub.united.com/corporate-fact-sheet/
3
Business Goals
 Improve Customer Experience
- How can we reduce friction when booking a reservation? Maneuvering through an airport?
- How can we deliver a consistent message across all channels? (mobile app, web site, social media etc)
 Improve Employee Experience
- How can we keep employees better informed of the current situation so they can relay it to the customers?
- What are we learning from our surveys about what the customer bases says is / isn’t working?
 Revenue Generation
- What personalized offers can we make to our customers?
- Are our offers competitive with the rest of the industry?
 Improve Operational Reliability
- How can we better prepare for weather or other operational interruptions?
- How can we manage the fleet better and insure spare parts are where they need to be?
4
Industry Ideas – Customer Experience
5
Use Case – Improve Customer Experience Via Social Media
 Social media represents a unique opportunity for any service company
- Connect with customers in a familiar environment.
- Consistent messaging and brand management.
- Build community and advocacy.
- Direct issues to appropriate channels so they can be handled expediently.
6
Use Case – Customer Experience
 Can we use social media as a giant issue tracking database?
 Obstacles:
- Who am I talking to?
- Is there an issue? If so, what is the issue?
- What is the current state of the issue? How did it get there?
- Are there any recommendations on how to handle the issue?
- Who is best equipped to handle this issue?
All of these need to be overcome within a few seconds of receiving a notification…
7
Use Case – Customer Experience
 Actions
- Identification (Who am I talking to?)
- Classification, prioritization (Is there an issue? What is it? How important is it?)
- State determination (What is the current state of the issue? How did it get there?)
- Recommendation, clustering (Are there any recommendations on how to handle the issue?)
- Routing (Who is best equipped to handle this issue?)
Conclusion: several enrichments + state lookup
Other needs: low latency, fault tolerance, high availability, elasticity…
8
Stream Processing Engine
 Apache Flink - Stateful Computations over Data Streams
What about enrichment?
9
Stream Processing Engine - Enrichment
 Enrichment options:
- Option 1: Data lives in an external database or service using a map
- Option 2: Data arrives as a second stream
Option #1:
Social Media Messages
Social Media Messages
Source
Source
Map
(keyBy)
Map
(keyBy)
Map
Map
10
Stream Processing Engine - Enrichment
 Option #1 Issues
- Synchronous requests are slow and prone to error, jamming up the pipeline
- Wasted resources while waiting for the service to respond
 What about asynchronous?
- AsyncFunction in DataStream API since Flink 1.2
• A queue of promises
• Emitter on a different thread
- Client needs to support async requests
11
Stream Processing Engine - Enrichment
 Async call:
DataStream<Tuple2<String, String>> result =
AsyncDataStream.(un)orderedWait(stream,
new MyAsyncFunction(),
1000, TimeUnit.MILLISECONDS, 100)
– our asycFunction
– a timeout: max time until considered failed
– capacity: max number of queued up requests
– unorderedWait: emit results in order of completion
– orderedWait: emit results in order of arrival
Timeout: Exception thrown. Can override exception handler.
Capacity exceeded: back pressure.
12
Stream Processing Engine - Enrichment
 Option #2 - joining streams
Social Media Messages Source Map
(keyBy)
Social Media Messages Source Map
(keyBy)
Events Source Map
(keyBy)
Join
13
Stream Processing Engine - Joining
 Window join
- Only elements within the same window can be joined
• Tumbling window
• Sliding window
• Session window
- Interval Join
• Common key and where elements of stream B have event timestamps that lie in a relative
time interval to event timestamps of elements in stream A
14
Stream Processing Engine - State
 Managing state
- Ability to store and retrieve information about a key.
VS.
Client - Server Stateful Streaming
15
Stream Processing Engine - State
 Operate on a key-value pull on a keyed stream
 Several possible back ends, all easily configurable at cluster create time:
- Memory (very small state)
- File on disk
- RocksDB (very large state)
Keyed Stream
<Key> <Value>
16
Stream Processing Engine - State
 Types of state
- ValueState<T> - use this when the state is a single value
- ListState<T> - use this when the state is a list of items
- ReducingState<T> - single value that represents an aggregation of all values added to state
- AggregatingState<IN, OUT> - similar to ReducingState, the aggregation function can change
based on different inputs types.
- MapState<UK, UV> - mapping. Can use put(UK, UV) or get(UK). Also iterable.
17
Stream Processing Engine – Queryable State
 Ability to query state from outside a Flink cluster via an API:
Flink Compute Cluster
Keyed Stream <K>
<V>
18
State - Other Issues
 Fault tolerance and high availability: Savepointing
HDFS / S3, etc
19
Stream Processing - Other Issues
 Elasticity:
- Flink Active (Flink controls resource allocation) / Reactive (external entity controls resource
allocation) mode.
- FLIP-6
- Idea: cluster manager creates and destroys task managers based on demand.
- Flink Forward San Francisco 2019: Future of Apache Flink Deployments: Containers,
Kubernetes and More - Till Rohrmann
20
Use Case – Customer Experience
 Actions
- Identification (Who am I talking to?)
- Classification, prioritization (Is there an issue? What is it?)
- State determination (What is the current state of the issue, and how did it get there?)
- Recommendation, clustering (Are there any recommendations on how to handle the issue?)
- Routing (Who is best equipped to handle this issue?)
 Some of these are machine learning / model type applications.
 How to switch model versions without interrupting the stream?
- Control Stream!
21
Stream Processing Engine – Interacting With Models
 Control Stream:
Social Media Messages Source
Model A
List State
Make sure the output stream contains which model version was used!
Map
(KeyBy)
Control Stream Source
Connect CoFlatMap
Model B
Map
(KeyBy)
22
Apache Communities
 Twitter: @ApacheFlink
 Mailing Lists
- news@flink.apache.org
- community@flink.apache.org
- user@flink.apache.org
- dev@flink.apache.org
- issues@flink.apache.org
 Stack Overflow: apache-flink tag
 Github
- https://github.com/apache/flink
Apache Flink
Thank You!
We’re hiring!
- Data Engineers
- Data Scientists

Weitere ähnliche Inhalte

Was ist angesagt?

Streaming Visualization
Streaming VisualizationStreaming Visualization
Streaming VisualizationGuido Schmutz
 
Data Integration with Apache Kafka: What, Why, How
Data Integration with Apache Kafka: What, Why, HowData Integration with Apache Kafka: What, Why, How
Data Integration with Apache Kafka: What, Why, HowPat Patterson
 
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...HostedbyConfluent
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream ProcessingGuido Schmutz
 
Enhancing Apache Kafka for Large Scale Real-Time Data Pipeline at Tencent | K...
Enhancing Apache Kafka for Large Scale Real-Time Data Pipeline at Tencent | K...Enhancing Apache Kafka for Large Scale Real-Time Data Pipeline at Tencent | K...
Enhancing Apache Kafka for Large Scale Real-Time Data Pipeline at Tencent | K...HostedbyConfluent
 
Mind the App: How to Monitor Your Kafka Streams Applications | Bruno Cadonna,...
Mind the App: How to Monitor Your Kafka Streams Applications | Bruno Cadonna,...Mind the App: How to Monitor Your Kafka Streams Applications | Bruno Cadonna,...
Mind the App: How to Monitor Your Kafka Streams Applications | Bruno Cadonna,...HostedbyConfluent
 
Why Cloud-Native Kafka Matters: 4 Reasons to Stop Managing it Yourself
Why Cloud-Native Kafka Matters: 4 Reasons to Stop Managing it YourselfWhy Cloud-Native Kafka Matters: 4 Reasons to Stop Managing it Yourself
Why Cloud-Native Kafka Matters: 4 Reasons to Stop Managing it YourselfDATAVERSITY
 
Operational Analytics on Event Streams in Kafka
Operational Analytics on Event Streams in KafkaOperational Analytics on Event Streams in Kafka
Operational Analytics on Event Streams in Kafkaconfluent
 
Hadoop made fast - Why Virtual Reality Needed Stream Processing to Survive
Hadoop made fast - Why Virtual Reality Needed Stream Processing to SurviveHadoop made fast - Why Virtual Reality Needed Stream Processing to Survive
Hadoop made fast - Why Virtual Reality Needed Stream Processing to Surviveconfluent
 
Event Broker (Kafka) in a Modern Data Architecture
Event Broker (Kafka) in a Modern Data ArchitectureEvent Broker (Kafka) in a Modern Data Architecture
Event Broker (Kafka) in a Modern Data ArchitectureGuido Schmutz
 
Real-time processing of large amounts of data
Real-time processing of large amounts of dataReal-time processing of large amounts of data
Real-time processing of large amounts of dataconfluent
 
Kafka as your Data Lake - is it Feasible?
Kafka as your Data Lake - is it Feasible?Kafka as your Data Lake - is it Feasible?
Kafka as your Data Lake - is it Feasible?Guido Schmutz
 
Technical Deep Dive: Using Apache Kafka to Optimize Real-Time Analytics in Fi...
Technical Deep Dive: Using Apache Kafka to Optimize Real-Time Analytics in Fi...Technical Deep Dive: Using Apache Kafka to Optimize Real-Time Analytics in Fi...
Technical Deep Dive: Using Apache Kafka to Optimize Real-Time Analytics in Fi...confluent
 
How a distributed graph analytics platform uses Apache Kafka for data ingesti...
How a distributed graph analytics platform uses Apache Kafka for data ingesti...How a distributed graph analytics platform uses Apache Kafka for data ingesti...
How a distributed graph analytics platform uses Apache Kafka for data ingesti...HostedbyConfluent
 
Why SQL? | Kenny Gorman, Cloudera
Why SQL? | Kenny Gorman, ClouderaWhy SQL? | Kenny Gorman, Cloudera
Why SQL? | Kenny Gorman, ClouderaHostedbyConfluent
 
Using Apache Cassandra and Apache Kafka to Scale Next Gen Applications
Using Apache Cassandra and Apache Kafka to Scale Next Gen ApplicationsUsing Apache Cassandra and Apache Kafka to Scale Next Gen Applications
Using Apache Cassandra and Apache Kafka to Scale Next Gen ApplicationsData Con LA
 
SIEM Modernization: Build a Situationally Aware Organization with Apache Kafka®
SIEM Modernization: Build a Situationally Aware Organization with Apache Kafka®SIEM Modernization: Build a Situationally Aware Organization with Apache Kafka®
SIEM Modernization: Build a Situationally Aware Organization with Apache Kafka®confluent
 
Evolving from Messaging to Event Streaming
Evolving from Messaging to Event StreamingEvolving from Messaging to Event Streaming
Evolving from Messaging to Event Streamingconfluent
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Guido Schmutz
 
ETL as a Platform: Pandora Plays Nicely Everywhere with Real-Time Data Pipelines
ETL as a Platform: Pandora Plays Nicely Everywhere with Real-Time Data PipelinesETL as a Platform: Pandora Plays Nicely Everywhere with Real-Time Data Pipelines
ETL as a Platform: Pandora Plays Nicely Everywhere with Real-Time Data Pipelinesconfluent
 

Was ist angesagt? (20)

Streaming Visualization
Streaming VisualizationStreaming Visualization
Streaming Visualization
 
Data Integration with Apache Kafka: What, Why, How
Data Integration with Apache Kafka: What, Why, HowData Integration with Apache Kafka: What, Why, How
Data Integration with Apache Kafka: What, Why, How
 
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream Processing
 
Enhancing Apache Kafka for Large Scale Real-Time Data Pipeline at Tencent | K...
Enhancing Apache Kafka for Large Scale Real-Time Data Pipeline at Tencent | K...Enhancing Apache Kafka for Large Scale Real-Time Data Pipeline at Tencent | K...
Enhancing Apache Kafka for Large Scale Real-Time Data Pipeline at Tencent | K...
 
Mind the App: How to Monitor Your Kafka Streams Applications | Bruno Cadonna,...
Mind the App: How to Monitor Your Kafka Streams Applications | Bruno Cadonna,...Mind the App: How to Monitor Your Kafka Streams Applications | Bruno Cadonna,...
Mind the App: How to Monitor Your Kafka Streams Applications | Bruno Cadonna,...
 
Why Cloud-Native Kafka Matters: 4 Reasons to Stop Managing it Yourself
Why Cloud-Native Kafka Matters: 4 Reasons to Stop Managing it YourselfWhy Cloud-Native Kafka Matters: 4 Reasons to Stop Managing it Yourself
Why Cloud-Native Kafka Matters: 4 Reasons to Stop Managing it Yourself
 
Operational Analytics on Event Streams in Kafka
Operational Analytics on Event Streams in KafkaOperational Analytics on Event Streams in Kafka
Operational Analytics on Event Streams in Kafka
 
Hadoop made fast - Why Virtual Reality Needed Stream Processing to Survive
Hadoop made fast - Why Virtual Reality Needed Stream Processing to SurviveHadoop made fast - Why Virtual Reality Needed Stream Processing to Survive
Hadoop made fast - Why Virtual Reality Needed Stream Processing to Survive
 
Event Broker (Kafka) in a Modern Data Architecture
Event Broker (Kafka) in a Modern Data ArchitectureEvent Broker (Kafka) in a Modern Data Architecture
Event Broker (Kafka) in a Modern Data Architecture
 
Real-time processing of large amounts of data
Real-time processing of large amounts of dataReal-time processing of large amounts of data
Real-time processing of large amounts of data
 
Kafka as your Data Lake - is it Feasible?
Kafka as your Data Lake - is it Feasible?Kafka as your Data Lake - is it Feasible?
Kafka as your Data Lake - is it Feasible?
 
Technical Deep Dive: Using Apache Kafka to Optimize Real-Time Analytics in Fi...
Technical Deep Dive: Using Apache Kafka to Optimize Real-Time Analytics in Fi...Technical Deep Dive: Using Apache Kafka to Optimize Real-Time Analytics in Fi...
Technical Deep Dive: Using Apache Kafka to Optimize Real-Time Analytics in Fi...
 
How a distributed graph analytics platform uses Apache Kafka for data ingesti...
How a distributed graph analytics platform uses Apache Kafka for data ingesti...How a distributed graph analytics platform uses Apache Kafka for data ingesti...
How a distributed graph analytics platform uses Apache Kafka for data ingesti...
 
Why SQL? | Kenny Gorman, Cloudera
Why SQL? | Kenny Gorman, ClouderaWhy SQL? | Kenny Gorman, Cloudera
Why SQL? | Kenny Gorman, Cloudera
 
Using Apache Cassandra and Apache Kafka to Scale Next Gen Applications
Using Apache Cassandra and Apache Kafka to Scale Next Gen ApplicationsUsing Apache Cassandra and Apache Kafka to Scale Next Gen Applications
Using Apache Cassandra and Apache Kafka to Scale Next Gen Applications
 
SIEM Modernization: Build a Situationally Aware Organization with Apache Kafka®
SIEM Modernization: Build a Situationally Aware Organization with Apache Kafka®SIEM Modernization: Build a Situationally Aware Organization with Apache Kafka®
SIEM Modernization: Build a Situationally Aware Organization with Apache Kafka®
 
Evolving from Messaging to Event Streaming
Evolving from Messaging to Event StreamingEvolving from Messaging to Event Streaming
Evolving from Messaging to Event Streaming
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !
 
ETL as a Platform: Pandora Plays Nicely Everywhere with Real-Time Data Pipelines
ETL as a Platform: Pandora Plays Nicely Everywhere with Real-Time Data PipelinesETL as a Platform: Pandora Plays Nicely Everywhere with Real-Time Data Pipelines
ETL as a Platform: Pandora Plays Nicely Everywhere with Real-Time Data Pipelines
 

Ähnlich wie Streaming and Social Media

Narrative Mind Week 9 H4D Stanford 2016
Narrative Mind Week 9 H4D Stanford 2016Narrative Mind Week 9 H4D Stanford 2016
Narrative Mind Week 9 H4D Stanford 2016Stanford University
 
Big data at United Airlines
Big data at United AirlinesBig data at United Airlines
Big data at United AirlinesDataWorks Summit
 
Evolution of a big data project
Evolution of a big data projectEvolution of a big data project
Evolution of a big data projectMichael Peacock
 
Jazz for Service Management
Jazz for Service ManagementJazz for Service Management
Jazz for Service ManagementIBM Danmark
 
Flink Forward Berlin 2017 Keynote: Ferd Scheepers - Taking away customer fric...
Flink Forward Berlin 2017 Keynote: Ferd Scheepers - Taking away customer fric...Flink Forward Berlin 2017 Keynote: Ferd Scheepers - Taking away customer fric...
Flink Forward Berlin 2017 Keynote: Ferd Scheepers - Taking away customer fric...Flink Forward
 
Divya 3 yrs exp in qa engg
Divya 3 yrs exp in qa enggDivya 3 yrs exp in qa engg
Divya 3 yrs exp in qa enggDivya Lakshmi.B
 
Getting Started with Splunk Enterprise
Getting Started with Splunk EnterpriseGetting Started with Splunk Enterprise
Getting Started with Splunk EnterpriseSplunk
 
Architecting and Tuning IIB/eXtreme Scale for Maximum Performance and Reliabi...
Architecting and Tuning IIB/eXtreme Scale for Maximum Performance and Reliabi...Architecting and Tuning IIB/eXtreme Scale for Maximum Performance and Reliabi...
Architecting and Tuning IIB/eXtreme Scale for Maximum Performance and Reliabi...Prolifics
 
MIT lecture - Socrata Open Data Architecture
MIT lecture - Socrata Open Data ArchitectureMIT lecture - Socrata Open Data Architecture
MIT lecture - Socrata Open Data ArchitectureEvan Chan
 
Increase payment platform adoption by growing partner/client categories
Increase payment platform adoption by growing partner/client categoriesIncrease payment platform adoption by growing partner/client categories
Increase payment platform adoption by growing partner/client categoriesBhaskar Jayaraman
 
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...confluent
 
Presentation Data Council Meetup: F. Mekkenholt, R. Vlijm
Presentation Data Council Meetup: F. Mekkenholt, R. VlijmPresentation Data Council Meetup: F. Mekkenholt, R. Vlijm
Presentation Data Council Meetup: F. Mekkenholt, R. VlijmAlexander Oppel
 
Processing Real-Time Data at Scale: A streaming platform as a central nervous...
Processing Real-Time Data at Scale: A streaming platform as a central nervous...Processing Real-Time Data at Scale: A streaming platform as a central nervous...
Processing Real-Time Data at Scale: A streaming platform as a central nervous...confluent
 
Split my monolith! Workshop
Split my monolith! Workshop Split my monolith! Workshop
Split my monolith! Workshop martinsson
 
Greenplum for Internet Scale Analytics and Mining - Greenplum Summit 2018
Greenplum for Internet Scale Analytics and Mining - Greenplum Summit 2018Greenplum for Internet Scale Analytics and Mining - Greenplum Summit 2018
Greenplum for Internet Scale Analytics and Mining - Greenplum Summit 2018VMware Tanzu
 

Ähnlich wie Streaming and Social Media (20)

Narrative Mind Week 9 H4D Stanford 2016
Narrative Mind Week 9 H4D Stanford 2016Narrative Mind Week 9 H4D Stanford 2016
Narrative Mind Week 9 H4D Stanford 2016
 
Big data at United Airlines
Big data at United AirlinesBig data at United Airlines
Big data at United Airlines
 
Evolution of a big data project
Evolution of a big data projectEvolution of a big data project
Evolution of a big data project
 
Jazz for Service Management
Jazz for Service ManagementJazz for Service Management
Jazz for Service Management
 
Flink Forward Berlin 2017 Keynote: Ferd Scheepers - Taking away customer fric...
Flink Forward Berlin 2017 Keynote: Ferd Scheepers - Taking away customer fric...Flink Forward Berlin 2017 Keynote: Ferd Scheepers - Taking away customer fric...
Flink Forward Berlin 2017 Keynote: Ferd Scheepers - Taking away customer fric...
 
Divya 3 yrs exp in qa engg
Divya 3 yrs exp in qa enggDivya 3 yrs exp in qa engg
Divya 3 yrs exp in qa engg
 
Getting Started with Splunk Enterprise
Getting Started with Splunk EnterpriseGetting Started with Splunk Enterprise
Getting Started with Splunk Enterprise
 
Architecting and Tuning IIB/eXtreme Scale for Maximum Performance and Reliabi...
Architecting and Tuning IIB/eXtreme Scale for Maximum Performance and Reliabi...Architecting and Tuning IIB/eXtreme Scale for Maximum Performance and Reliabi...
Architecting and Tuning IIB/eXtreme Scale for Maximum Performance and Reliabi...
 
Rohit_Gupta
Rohit_GuptaRohit_Gupta
Rohit_Gupta
 
MIT lecture - Socrata Open Data Architecture
MIT lecture - Socrata Open Data ArchitectureMIT lecture - Socrata Open Data Architecture
MIT lecture - Socrata Open Data Architecture
 
Increase payment platform adoption by growing partner/client categories
Increase payment platform adoption by growing partner/client categoriesIncrease payment platform adoption by growing partner/client categories
Increase payment platform adoption by growing partner/client categories
 
AlBaraaAhmed_20160523
AlBaraaAhmed_20160523AlBaraaAhmed_20160523
AlBaraaAhmed_20160523
 
Loan Decisioning Transformation
Loan Decisioning TransformationLoan Decisioning Transformation
Loan Decisioning Transformation
 
Rohit Gupta
Rohit GuptaRohit Gupta
Rohit Gupta
 
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
 
Presentation Data Council Meetup: F. Mekkenholt, R. Vlijm
Presentation Data Council Meetup: F. Mekkenholt, R. VlijmPresentation Data Council Meetup: F. Mekkenholt, R. Vlijm
Presentation Data Council Meetup: F. Mekkenholt, R. Vlijm
 
Processing Real-Time Data at Scale: A streaming platform as a central nervous...
Processing Real-Time Data at Scale: A streaming platform as a central nervous...Processing Real-Time Data at Scale: A streaming platform as a central nervous...
Processing Real-Time Data at Scale: A streaming platform as a central nervous...
 
IBM Rational HATS Overview 2013
IBM Rational HATS Overview 2013IBM Rational HATS Overview 2013
IBM Rational HATS Overview 2013
 
Split my monolith! Workshop
Split my monolith! Workshop Split my monolith! Workshop
Split my monolith! Workshop
 
Greenplum for Internet Scale Analytics and Mining - Greenplum Summit 2018
Greenplum for Internet Scale Analytics and Mining - Greenplum Summit 2018Greenplum for Internet Scale Analytics and Mining - Greenplum Summit 2018
Greenplum for Internet Scale Analytics and Mining - Greenplum Summit 2018
 

Kürzlich hochgeladen

Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 

Kürzlich hochgeladen (20)

Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 

Streaming and Social Media

  • 1. Streaming and Social Media Joe Olson Senior Manager, Big Data Analytics Apache Road Show Chicago - May 2019
  • 2. Agenda United and the Airline Industry How Streaming Model Presents Opportunity Apache Flink 4 Q & A
  • 3. 2 About United Airlines…..  1,348 aircraft (779 mainline, 569 regional) with 250+ on order (supply chain)  158M passengers in 2018 (public facing web site, mobile app, time / geospatial based inventory, loyalty program, surveys, ancillary sales)  4900 daily departures (scheduling, operations, weather, route planning)  355 airports served, in 48 countries (baggage claim, check-ins)  88,000 employees worldwide (scheduling, pay)  Constantly in motion! Future (and past) always changing.  A data scientist / data engineer dream. Source: https://hub.united.com/corporate-fact-sheet/
  • 4. 3 Business Goals  Improve Customer Experience - How can we reduce friction when booking a reservation? Maneuvering through an airport? - How can we deliver a consistent message across all channels? (mobile app, web site, social media etc)  Improve Employee Experience - How can we keep employees better informed of the current situation so they can relay it to the customers? - What are we learning from our surveys about what the customer bases says is / isn’t working?  Revenue Generation - What personalized offers can we make to our customers? - Are our offers competitive with the rest of the industry?  Improve Operational Reliability - How can we better prepare for weather or other operational interruptions? - How can we manage the fleet better and insure spare parts are where they need to be?
  • 5. 4 Industry Ideas – Customer Experience
  • 6. 5 Use Case – Improve Customer Experience Via Social Media  Social media represents a unique opportunity for any service company - Connect with customers in a familiar environment. - Consistent messaging and brand management. - Build community and advocacy. - Direct issues to appropriate channels so they can be handled expediently.
  • 7. 6 Use Case – Customer Experience  Can we use social media as a giant issue tracking database?  Obstacles: - Who am I talking to? - Is there an issue? If so, what is the issue? - What is the current state of the issue? How did it get there? - Are there any recommendations on how to handle the issue? - Who is best equipped to handle this issue? All of these need to be overcome within a few seconds of receiving a notification…
  • 8. 7 Use Case – Customer Experience  Actions - Identification (Who am I talking to?) - Classification, prioritization (Is there an issue? What is it? How important is it?) - State determination (What is the current state of the issue? How did it get there?) - Recommendation, clustering (Are there any recommendations on how to handle the issue?) - Routing (Who is best equipped to handle this issue?) Conclusion: several enrichments + state lookup Other needs: low latency, fault tolerance, high availability, elasticity…
  • 9. 8 Stream Processing Engine  Apache Flink - Stateful Computations over Data Streams What about enrichment?
  • 10. 9 Stream Processing Engine - Enrichment  Enrichment options: - Option 1: Data lives in an external database or service using a map - Option 2: Data arrives as a second stream Option #1: Social Media Messages Social Media Messages Source Source Map (keyBy) Map (keyBy) Map Map
  • 11. 10 Stream Processing Engine - Enrichment  Option #1 Issues - Synchronous requests are slow and prone to error, jamming up the pipeline - Wasted resources while waiting for the service to respond  What about asynchronous? - AsyncFunction in DataStream API since Flink 1.2 • A queue of promises • Emitter on a different thread - Client needs to support async requests
  • 12. 11 Stream Processing Engine - Enrichment  Async call: DataStream<Tuple2<String, String>> result = AsyncDataStream.(un)orderedWait(stream, new MyAsyncFunction(), 1000, TimeUnit.MILLISECONDS, 100) – our asycFunction – a timeout: max time until considered failed – capacity: max number of queued up requests – unorderedWait: emit results in order of completion – orderedWait: emit results in order of arrival Timeout: Exception thrown. Can override exception handler. Capacity exceeded: back pressure.
  • 13. 12 Stream Processing Engine - Enrichment  Option #2 - joining streams Social Media Messages Source Map (keyBy) Social Media Messages Source Map (keyBy) Events Source Map (keyBy) Join
  • 14. 13 Stream Processing Engine - Joining  Window join - Only elements within the same window can be joined • Tumbling window • Sliding window • Session window - Interval Join • Common key and where elements of stream B have event timestamps that lie in a relative time interval to event timestamps of elements in stream A
  • 15. 14 Stream Processing Engine - State  Managing state - Ability to store and retrieve information about a key. VS. Client - Server Stateful Streaming
  • 16. 15 Stream Processing Engine - State  Operate on a key-value pull on a keyed stream  Several possible back ends, all easily configurable at cluster create time: - Memory (very small state) - File on disk - RocksDB (very large state) Keyed Stream <Key> <Value>
  • 17. 16 Stream Processing Engine - State  Types of state - ValueState<T> - use this when the state is a single value - ListState<T> - use this when the state is a list of items - ReducingState<T> - single value that represents an aggregation of all values added to state - AggregatingState<IN, OUT> - similar to ReducingState, the aggregation function can change based on different inputs types. - MapState<UK, UV> - mapping. Can use put(UK, UV) or get(UK). Also iterable.
  • 18. 17 Stream Processing Engine – Queryable State  Ability to query state from outside a Flink cluster via an API: Flink Compute Cluster Keyed Stream <K> <V>
  • 19. 18 State - Other Issues  Fault tolerance and high availability: Savepointing HDFS / S3, etc
  • 20. 19 Stream Processing - Other Issues  Elasticity: - Flink Active (Flink controls resource allocation) / Reactive (external entity controls resource allocation) mode. - FLIP-6 - Idea: cluster manager creates and destroys task managers based on demand. - Flink Forward San Francisco 2019: Future of Apache Flink Deployments: Containers, Kubernetes and More - Till Rohrmann
  • 21. 20 Use Case – Customer Experience  Actions - Identification (Who am I talking to?) - Classification, prioritization (Is there an issue? What is it?) - State determination (What is the current state of the issue, and how did it get there?) - Recommendation, clustering (Are there any recommendations on how to handle the issue?) - Routing (Who is best equipped to handle this issue?)  Some of these are machine learning / model type applications.  How to switch model versions without interrupting the stream? - Control Stream!
  • 22. 21 Stream Processing Engine – Interacting With Models  Control Stream: Social Media Messages Source Model A List State Make sure the output stream contains which model version was used! Map (KeyBy) Control Stream Source Connect CoFlatMap Model B Map (KeyBy)
  • 23. 22 Apache Communities  Twitter: @ApacheFlink  Mailing Lists - news@flink.apache.org - community@flink.apache.org - user@flink.apache.org - dev@flink.apache.org - issues@flink.apache.org  Stack Overflow: apache-flink tag  Github - https://github.com/apache/flink Apache Flink
  • 24. Thank You! We’re hiring! - Data Engineers - Data Scientists