SlideShare ist ein Scribd-Unternehmen logo
1 von 49
Design Patterns for Big Data
Architecture: Best Strategies for
Streamlined [Simple, Powerful]
Design

Allen Day, PhD
Data Scientist, MapR Technologies
October 2013
©MapR Technologies - Confidential
Me, Us
‱ Allen Day, Principal Data Scientist, MapR
R contributor (10 yr), Hadoop (6 yr)
Human Genetics (UCLA Medicine), Machine Learning

‱ MapR
Distributes open source components for Hadoop
Adds major enhancements for performance, high-availability, and
ease-of-use

‱ See Also
– “allenday” most places (twitter, github, etc.)
– aday@maprtech.com, allenday@allenday.com
– @mapR
©MapR Technologies - Confidential
Three Business Use Cases
Personalized
Search

©MapR Technologies - Confidential

Personalized
Medicine

Market
Segmentation
Three Business Use Cases
Personalized
Search

Personalized
Medicine

‱ Public web index
+ personal
search history
‱ Custom ranking
of results

‱ Patient medical
history
‱ Genomic info.
‱ Match against
database of
therapies

©MapR Technologies - Confidential

Market
Segmentation
‱ Group similar
customers
‱ Target with
cross-sell / upsell campaign
Three Business Use Cases
Personalized
Search

Personalized
Medicine

‱ Public web index
+ personal
search history
‱ Custom ranking
of results

‱ Patient medical
history
‱ Genomic info.
‱ Match against
database of
therapies

Personal data
Personal data
Which ones are similar?
©MapR Technologies - Confidential

Market
Segmentation
‱ Group similar
customers
‱ Target with
cross-sell / upsell campaign

Marketing
Three Business Use Cases
Personalized
Search

Personalized
Medicine

‱ Public web index
+ personal
search history
‱ Custom ranking
of results

‱ Patient medical
history
‱ Genomic info.
‱ Match against
database of
therapies

Personal data
Personal data
Which ones are similar?
©MapR Technologies - Confidential

Market
Segmentation
‱ Group similar
customers
‱ Target with
cross-sell / upsell campaign

Marketing
Three Business Use Cases
Personalized
Search

Personalized
Medicine

‱ Public web index
+ personal
search history
‱ Custom ranking
of results

‱ Patient medical
history
‱ Genomic info.
‱ Match against
database of
therapies

Personal data
How can you tell?

Personal data

©MapR Technologies - Confidential

Market
Segmentation
‱ Group similar
customers
‱ Target with
cross-sell / upsell campaign

Marketing
But First


WHAT IS A DESIGN PATTERN?

©MapR Technologies - Confidential
“a design pattern is a general reusable
solution to a commonly occurring
problem within a given context in software
design. A design pattern is not a finished
design that can be transformed directly
into source or machine code. It is a
description or template for how to solve a
problem that can be used in many
different situations”
http://en.wikipedia.org/wiki/Software_design_pattern

©MapR Technologies - Confidential
History of Design Pattern Ideation

1977
Architecture &
Civil Engineering

©MapR Technologies - Confidential

1994
OO Software
Architecture

2012
Parallelization
Software

?
Application
Parallelization
Not Just Software

http://en.wikipedia.org/wiki/A-line
©MapR Technologies - Confidential
Big Data Application Shapes
1. How big is your input record?
2. How big is the data that is relevant to processing the
input record?
3. How big is the total data that could be relevant to
processing the input?
4. How fast do inputs flow in?
5. How fast do outputs need to flow out?
6. How complex (unstructured) are 1-5?
7. How predictable are 1-6? (spikiness, variance)
8. Is accuracy more important than speed?
9. Does the processing contain cycles (feedback loops)?
©MapR Technologies - Confidential
Big Data Application Shapes
1. How big is your input record?
2. How big is the data that is relevant to processing the
input record?
3. How big is the total data that could be relevant to
processing the input?
4. How fast do inputs flow in?
5. How fast do outputs need to flow out?
6. How complex (unstructured) are 1-5?
7. How predictable are 1-6? (spikiness, variance)
8. Is accuracy more important than speed?
9. Does the processing contain cycles (feedback loops)?

Volume

Velocity
Variety

©MapR Technologies - Confidential
Big Data Application Shapes
1. How big is your input record?
2. How big is the data that is relevant to processing the
input record?
3. How big is the total data that could be relevant to
processing the input?
4. How fast do inputs flow in?
5. How fast do outputs need to flow out?
6. How complex (unstructured) are 1-5?
7. How predictable are 1-6? (spikiness, variance)
8. Is accuracy more important than speed?
9. Does the processing contain cycles (feedback loops)?
©MapR Technologies - Confidential
Big Data Application Shapes
1. How big is your input record?
2. How big is the data that is relevant to processing the
input record?
3. How big is the total data that could be relevant to
processing the input?
4. How fast do inputs flow in?
5. How fast do outputs need to flow out?
6. How complex (unstructured) are 1-5?
7. How predictable are 1-6? (spikiness, variance)
8. Is accuracy more important than speed?
9. Does the processing contain cycles (feedback loops)?
©MapR Technologies - Confidential
Choose a Pattern: Volume & Velocity
1. How big is your target data?
<10 GB

mid
?

?

A

Single element
at a time

>200 GB

2. How big is your query data?
One pass
over 100%

B

C

Big storage

Streaming

Multiple passes
over big chunks

3. How fast do you need a result?
Throughput >
response
D

©MapR Technologies - Confidential

Nearline
Analytics

< 100s
(human scale)
E
Exploratory
Analysis
Twitter Zeitgeist as a
Composite of Design Patterns
Live data source
e.g.
Twitter Firehose

B

C

Big storage

Streaming

D
©MapR Technologies - Confidential

Nearline
Analytics

Downstream applications
Big Data Application Shapes
1. How big is your input record?
2. How big is the data that is relevant to processing the
input record?
3. How big is the total data that could be relevant to
processing the input?
4. How fast do inputs flow in?
5. How fast do outputs need to flow out?
6. How complex (unstructured) are 1-5?
7. How predictable are 1-6? (spikiness, variance)
8. Is accuracy more important than speed?
9. Does the processing contain cycles (feedback loops)?

Volume

Velocity
Variety

©MapR Technologies - Confidential
Big Data Application Shapes
1. How big is your input record?
2. How big is the data that is relevant to processing the
input record?
3. How big is the total data that could be relevant to
processing the input?
4. How fast do inputs flow in?
5. How fast do outputs need to flow out?
6. How complex (unstructured) are 1-5?
7. How predictable are 1-6? (spikiness, variance)
8. Is accuracy more important than speed?
9. Does the processing contain cycles (feedback loops)?

Volume

Velocity
Variety
Intents & Methods

©MapR Technologies - Confidential
Application
characteristic

Personalized
Search

Personalized
Medicine

Market
Segmenting

Input record size
Co-processed data size
Archive size

Small
Large
Large

Large
Large
Small

Small
Large
Large

Input rate
Output rate
Process complexity
Input/process spikiness
Speed or accuracy?
Cycles?

Fast
Fast
High
Low
Speed
Yes

Fast
Slow
High
Low
Accuracy
No

Fast
Fast
Low
High
Speed
Yes

©MapR Technologies - Confidential
Percolation in Classic Form
Real-time data
source
Real-time
insertion

Data
store

OfïŹ‚ine
percolation
of recent data

Large-scale Incremental Processing Using Distributed Transactions and NotiïŹcations
http://research.google.com/pubs/pub36726.html
©MapR Technologies - Confidential
Percolation in Classic Form
Real-time data
source
Data
store

OfïŹ‚ine
percolation
of recent data

Queue

Data
store

Real-time
insertion
Queued data are unavailable for action – not
percolation

©MapR Technologies - Confidential

Real-time
insertion

Delayed
insertion
Percolation in Classic Form
Real-time data
source
Real-time
insertion

©MapR Technologies - Confidential

Data
store

OfïŹ‚ine
percolation
of recent data
Percolation of a Composite Store
Real-time data
source
Real-time
insertion

Data
store

OfïŹ‚ine
percolation
Index

Both parts visible

©MapR Technologies - Confidential
Market Segmentation
‱ Divide customers into subsets with common
needs
‱ Design specific strategies for each subset
‱ Major emphasis on “fresh” data

©MapR Technologies - Confidential
Market Segmentation
Feature
Extraction
Real-time
transactions
Customer
history

Assign
Segment
(search)
db
Market
Segments
What does this have to do with percolation?

©MapR Technologies - Confidential

query
Clustering
Percolator 1
Feature
Extraction
Real-time
transactions
Customer
history

©MapR Technologies - Confidential

Feature extraction is
percolation because it is
triggered by the arrival of a
new record and because it
updates that new record.
Percolator 2
Real-time
transactions
Customer
history

Market segment assignment
is percolation because it is
triggered by the arrival of a
new record and because
only that record's segment is
updated.

What about the clustering step?

©MapR Technologies - Confidential

Assign
Segment
(search)
db
Market
Segments

query
Scheduled Update - Not Percolation

Customer
history

Clustering
The clustering loop is not
percolation since it runs at
ïŹxed intervals instead of
incrementally as updates are
received. It also doesn't
update just a single
customer record.

©MapR Technologies - Confidential

Market
Segments
Personalized Search
‱ Observe web users’ activity over an extended
period
‱ Understand individual user interests
‱ Customize search results for each user
‱ 
as fast as possible

©MapR Technologies - Confidential
Personal Search History and Web Index
Search
Persona
Activity

db
query

Persona update
Histories
trigger

query

Search
Web
Crawl

feature
extraction

Doc
Store
©MapR Technologies - Confidential

db

update

trigger

Doc
Index

Persona
Index
Percolator 1

Expensive feature
extraction does not
block document ingest

Web
Crawl

feature
extraction

Doc
Store
©MapR Technologies - Confidential
Percolators 2 and 3
Persona
Activity
Persona update
Histories

Web
Crawl
Doc
Store
©MapR Technologies - Confidential

update

Doc
Index

Persona
Index
Percolator 4
Updates to personas
trigger updates in
related personas

Search
Persona
Activity

db
query

Persona update
Histories

©MapR Technologies - Confidential

Persona
Index
Percolator 5?

Persona
Index

Persona
Histories
trigger

query

Search
db

trigger

Doc
Index
©MapR Technologies - Confidential

Persona and doc
index updates trigger a
personalization refresh
Pattern Context
Persona
Activity

Web
Crawl

©MapR Technologies - Confidential

Encapsulated
Process
Cyclic Dependency Graph

©MapR Technologies - Confidential
Percolator Thoughts
‱ M7 tables are great as the first persistence point
in percolation
‱ In-memory flag column family works great for
triggering updates
– Efficient - eliminates need for queuing
– Fast triggering with row & column Bloom filters

‱ Percolation is best supported by dedicated
column families
– Percolators I/O characteristics differ
– M7 works especially well because it supports lots of
column families

©MapR Technologies - Confidential
Cyclic Dependency Graph, M7 Schema

©MapR Technologies - Confidential
Personalized Medicine
5. Interpretation
& Follow-up

4. Reporting

1. Select Tests

2. Draw Biosample

3. Genome Sequencing
& Analysis
©MapR Technologies - Confidential
Personalized Medicine Applications
‱ Pre-conception screening
‱ Clinical research & trials
– Drug re-targeting

‱ Therapeutics
– Companion diagnostics
– Therapy selection
©MapR Technologies - Confidential
Personalized Medicine
Patient
history
(EHR)

EHR
archive

Insert
(eventually)

db
Sequence
extraction

Patient
health
context

query

Search

Ranked
therapies

Genome
Sample

Here we do not see real-time data pushed to a persistence
layer and processed offline. This pattern does not fit with
percolation

©MapR Technologies - Confidential
Personalized Medicine
Patient
history
(EHR)

EHR
archive

Insert
(eventually)

db
Sequence
extraction
Genome
Sample

©MapR Technologies - Confidential

Patient
health
context

query

Search

User-based recommendation pattern

Ranked
therapies
Recommendation in Classic Form

Queue

History
Archive

db
Recent
history

©MapR Technologies - Confidential

query

User
Search

Ranked
similar
histories
Item-Based Recommendation
in Classic Form
Queue

History
archive

Cooccurrence
analysis

Off-line analysis

Recent
history
query

Item
linkage
db

Search

©MapR Technologies - Confidential

Interactive recommendation

Ranked
items
Recommendation Thoughts
‱ Item-based recommendation is for efficiency
– expensive step in computing co-occurrence can be
done offline and cached prior to a user query

‱ User-based recommendation is for accuracy
– user comparisons are done online to find the current
best recommendation

‱ MapR is great for recommendation
– M7 tables are high I/O performance, can eliminate
queues
– Faster archive updates with optimized MapReduce
– High-availability for mission LIFE critical applications

©MapR Technologies - Confidential
Business Use Cases
& Design Patterns
Recommender –
Personalized
Medicine

Pattern X –
Health data

Percolator –
Personalized
Search

Percolator –
Other Industry

Percolator –
Personalized
Medicine

Pattern X –
Other Industry

©MapR Technologies - Confidential
Summary: Best Practices
‱ Look at the big picture
– Find recurring patterns

‱ Design systems at a high-level
– Solve problems once and reuse components
– Increase R&D productivity
– Decrease operational and maintenance overhead

©MapR Technologies - Confidential
Thank You!
Allen Day, PhD
Principal Data Scientist, MapR Technologies
aday@maprtech.com, allenday@allenday.com
@allenday, @mapr
©MapR Technologies - Confidential

Weitere Àhnliche Inhalte

Was ist angesagt?

Pouring the Foundation: Data Management in the Energy Industry
Pouring the Foundation: Data Management in the Energy IndustryPouring the Foundation: Data Management in the Energy Industry
Pouring the Foundation: Data Management in the Energy IndustryDataWorks Summit
 
An introduction to Big Data
An introduction to Big DataAn introduction to Big Data
An introduction to Big DataForwardSprint
 
Rob peglar introduction_analytics _big data_hadoop
Rob peglar introduction_analytics _big data_hadoopRob peglar introduction_analytics _big data_hadoop
Rob peglar introduction_analytics _big data_hadoopGhassan Al-Yafie
 
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...Agile Testing Alliance
 
Real World Use Cases: Hadoop and NoSQL in Production
Real World Use Cases: Hadoop and NoSQL in ProductionReal World Use Cases: Hadoop and NoSQL in Production
Real World Use Cases: Hadoop and NoSQL in ProductionCodemotion
 
Big Data Architecture and Deployment
Big Data Architecture and DeploymentBig Data Architecture and Deployment
Big Data Architecture and DeploymentCisco Canada
 
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient DataWorks Summit/Hadoop Summit
 
A Study Review of Common Big Data Architecture for Small-Medium Enterprise
A Study Review of Common Big Data Architecture for Small-Medium EnterpriseA Study Review of Common Big Data Architecture for Small-Medium Enterprise
A Study Review of Common Big Data Architecture for Small-Medium EnterpriseRidwan Fadjar
 
Spark in the Hadoop Ecosystem-(Mike Olson, Cloudera)
Spark in the Hadoop Ecosystem-(Mike Olson, Cloudera)Spark in the Hadoop Ecosystem-(Mike Olson, Cloudera)
Spark in the Hadoop Ecosystem-(Mike Olson, Cloudera)Spark Summit
 
LendingClub RealTime BigData Platform with Oracle GoldenGate
LendingClub RealTime BigData Platform with Oracle GoldenGateLendingClub RealTime BigData Platform with Oracle GoldenGate
LendingClub RealTime BigData Platform with Oracle GoldenGateRajit Saha
 
NoSQL Application Development with JSON and MapR-DB
NoSQL Application Development with JSON and MapR-DBNoSQL Application Development with JSON and MapR-DB
NoSQL Application Development with JSON and MapR-DBMapR Technologies
 
Deep Learning vs. Cheap Learning
Deep Learning vs. Cheap LearningDeep Learning vs. Cheap Learning
Deep Learning vs. Cheap LearningMapR Technologies
 
Predictive Analytics with Hadoop
Predictive Analytics with HadoopPredictive Analytics with Hadoop
Predictive Analytics with HadoopDataWorks Summit
 
MapR Streams and MapR Converged Data Platform
MapR Streams and MapR Converged Data PlatformMapR Streams and MapR Converged Data Platform
MapR Streams and MapR Converged Data PlatformMapR Technologies
 
MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron (...
MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron (...MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron (...
MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron (...DataWorks Summit
 
Lecture6 introduction to data streams
Lecture6 introduction to data streamsLecture6 introduction to data streams
Lecture6 introduction to data streamshktripathy
 
How big data and AI saved the day: critical IP almost walked out the door
How big data and AI saved the day: critical IP almost walked out the doorHow big data and AI saved the day: critical IP almost walked out the door
How big data and AI saved the day: critical IP almost walked out the doorDataWorks Summit
 
Beyond Kerberos and Ranger - Tips to discover, track and manage risks in hybr...
Beyond Kerberos and Ranger - Tips to discover, track and manage risks in hybr...Beyond Kerberos and Ranger - Tips to discover, track and manage risks in hybr...
Beyond Kerberos and Ranger - Tips to discover, track and manage risks in hybr...DataWorks Summit
 

Was ist angesagt? (20)

Pouring the Foundation: Data Management in the Energy Industry
Pouring the Foundation: Data Management in the Energy IndustryPouring the Foundation: Data Management in the Energy Industry
Pouring the Foundation: Data Management in the Energy Industry
 
An introduction to Big Data
An introduction to Big DataAn introduction to Big Data
An introduction to Big Data
 
Rob peglar introduction_analytics _big data_hadoop
Rob peglar introduction_analytics _big data_hadoopRob peglar introduction_analytics _big data_hadoop
Rob peglar introduction_analytics _big data_hadoop
 
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...
 
Real World Use Cases: Hadoop and NoSQL in Production
Real World Use Cases: Hadoop and NoSQL in ProductionReal World Use Cases: Hadoop and NoSQL in Production
Real World Use Cases: Hadoop and NoSQL in Production
 
Big Data Architecture and Deployment
Big Data Architecture and DeploymentBig Data Architecture and Deployment
Big Data Architecture and Deployment
 
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
 
A Study Review of Common Big Data Architecture for Small-Medium Enterprise
A Study Review of Common Big Data Architecture for Small-Medium EnterpriseA Study Review of Common Big Data Architecture for Small-Medium Enterprise
A Study Review of Common Big Data Architecture for Small-Medium Enterprise
 
Spark in the Hadoop Ecosystem-(Mike Olson, Cloudera)
Spark in the Hadoop Ecosystem-(Mike Olson, Cloudera)Spark in the Hadoop Ecosystem-(Mike Olson, Cloudera)
Spark in the Hadoop Ecosystem-(Mike Olson, Cloudera)
 
LendingClub RealTime BigData Platform with Oracle GoldenGate
LendingClub RealTime BigData Platform with Oracle GoldenGateLendingClub RealTime BigData Platform with Oracle GoldenGate
LendingClub RealTime BigData Platform with Oracle GoldenGate
 
NoSQL Application Development with JSON and MapR-DB
NoSQL Application Development with JSON and MapR-DBNoSQL Application Development with JSON and MapR-DB
NoSQL Application Development with JSON and MapR-DB
 
Deep Learning vs. Cheap Learning
Deep Learning vs. Cheap LearningDeep Learning vs. Cheap Learning
Deep Learning vs. Cheap Learning
 
Predictive Analytics with Hadoop
Predictive Analytics with HadoopPredictive Analytics with Hadoop
Predictive Analytics with Hadoop
 
MapR Streams and MapR Converged Data Platform
MapR Streams and MapR Converged Data PlatformMapR Streams and MapR Converged Data Platform
MapR Streams and MapR Converged Data Platform
 
Keys for Success from Streams to Queries
Keys for Success from Streams to QueriesKeys for Success from Streams to Queries
Keys for Success from Streams to Queries
 
MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron (...
MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron (...MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron (...
MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron (...
 
MapR & Skytree:
MapR & Skytree: MapR & Skytree:
MapR & Skytree:
 
Lecture6 introduction to data streams
Lecture6 introduction to data streamsLecture6 introduction to data streams
Lecture6 introduction to data streams
 
How big data and AI saved the day: critical IP almost walked out the door
How big data and AI saved the day: critical IP almost walked out the doorHow big data and AI saved the day: critical IP almost walked out the door
How big data and AI saved the day: critical IP almost walked out the door
 
Beyond Kerberos and Ranger - Tips to discover, track and manage risks in hybr...
Beyond Kerberos and Ranger - Tips to discover, track and manage risks in hybr...Beyond Kerberos and Ranger - Tips to discover, track and manage risks in hybr...
Beyond Kerberos and Ranger - Tips to discover, track and manage risks in hybr...
 

Andere mochten auch

Online %26 offline marketing
Online %26 offline marketingOnline %26 offline marketing
Online %26 offline marketingAchi Amarasinghe
 
State of the Consumer Banking Experience
State of the Consumer Banking ExperienceState of the Consumer Banking Experience
State of the Consumer Banking ExperienceInvoca
 
Machine Learning at Scale
Machine Learning at ScaleMachine Learning at Scale
Machine Learning at ScaleMadhukara Phatak
 
First-passage percolation on random planar maps
First-passage percolation on random planar mapsFirst-passage percolation on random planar maps
First-passage percolation on random planar mapsTimothy Budd
 
mtc All Hands 8/15 Werte
mtc All Hands 8/15 Wertemtc All Hands 8/15 Werte
mtc All Hands 8/15 WerteArne Krueger
 
Paper Review: An exact mapping between the Variational Renormalization Group ...
Paper Review: An exact mapping between the Variational Renormalization Group ...Paper Review: An exact mapping between the Variational Renormalization Group ...
Paper Review: An exact mapping between the Variational Renormalization Group ...Kai-Wen Zhao
 
Elastic Search
Elastic SearchElastic Search
Elastic SearchLukas Vlcek
 
Artificial intelligence 2015: Quo Vadis?
Artificial intelligence 2015: Quo Vadis?Artificial intelligence 2015: Quo Vadis?
Artificial intelligence 2015: Quo Vadis?Sergey Shelpuk
 
Network-Growth Rule Dependence of Fractal Dimension of Percolation Cluster on...
Network-Growth Rule Dependence of Fractal Dimension of Percolation Cluster on...Network-Growth Rule Dependence of Fractal Dimension of Percolation Cluster on...
Network-Growth Rule Dependence of Fractal Dimension of Percolation Cluster on...Shu Tanaka
 
Machine Learning and Logging for Monitoring Microservices
Machine Learning and Logging for Monitoring Microservices Machine Learning and Logging for Monitoring Microservices
Machine Learning and Logging for Monitoring Microservices Daniel Berman
 
Scalable and Reliable Logging at Pinterest
Scalable and Reliable Logging at PinterestScalable and Reliable Logging at Pinterest
Scalable and Reliable Logging at PinterestKrishna Gade
 
Percolation
PercolationPercolation
PercolationESUG
 
Interlayer-Interaction Dependence of Latent Heat in the Heisenberg Model on a...
Interlayer-Interaction Dependence of Latent Heat in the Heisenberg Model on a...Interlayer-Interaction Dependence of Latent Heat in the Heisenberg Model on a...
Interlayer-Interaction Dependence of Latent Heat in the Heisenberg Model on a...Shu Tanaka
 
Engineering Intelligent NLP Applications Using Deep Learning – Part 1
Engineering Intelligent NLP Applications Using Deep Learning – Part 1Engineering Intelligent NLP Applications Using Deep Learning – Part 1
Engineering Intelligent NLP Applications Using Deep Learning – Part 1Saurabh Kaushik
 
Predictive analytics in mobility
Predictive analytics in mobilityPredictive analytics in mobility
Predictive analytics in mobilityEktimo
 
BigData & Supply Chain: A "Small" Introduction
BigData & Supply Chain: A "Small" IntroductionBigData & Supply Chain: A "Small" Introduction
BigData & Supply Chain: A "Small" IntroductionIvan Gruer
 
Logging : How much is too much? Network Security Monitoring Talk @ hasgeek
Logging : How much is too much? Network Security Monitoring Talk @ hasgeekLogging : How much is too much? Network Security Monitoring Talk @ hasgeek
Logging : How much is too much? Network Security Monitoring Talk @ hasgeekvivekrajan
 

Andere mochten auch (20)

Online %26 offline marketing
Online %26 offline marketingOnline %26 offline marketing
Online %26 offline marketing
 
State of the Consumer Banking Experience
State of the Consumer Banking ExperienceState of the Consumer Banking Experience
State of the Consumer Banking Experience
 
Logging in moodle
Logging in moodleLogging in moodle
Logging in moodle
 
Percolation Model and Controllability
Percolation Model and ControllabilityPercolation Model and Controllability
Percolation Model and Controllability
 
Machine Learning at Scale
Machine Learning at ScaleMachine Learning at Scale
Machine Learning at Scale
 
First-passage percolation on random planar maps
First-passage percolation on random planar mapsFirst-passage percolation on random planar maps
First-passage percolation on random planar maps
 
mtc All Hands 8/15 Werte
mtc All Hands 8/15 Wertemtc All Hands 8/15 Werte
mtc All Hands 8/15 Werte
 
Percolation
PercolationPercolation
Percolation
 
Paper Review: An exact mapping between the Variational Renormalization Group ...
Paper Review: An exact mapping between the Variational Renormalization Group ...Paper Review: An exact mapping between the Variational Renormalization Group ...
Paper Review: An exact mapping between the Variational Renormalization Group ...
 
Elastic Search
Elastic SearchElastic Search
Elastic Search
 
Artificial intelligence 2015: Quo Vadis?
Artificial intelligence 2015: Quo Vadis?Artificial intelligence 2015: Quo Vadis?
Artificial intelligence 2015: Quo Vadis?
 
Network-Growth Rule Dependence of Fractal Dimension of Percolation Cluster on...
Network-Growth Rule Dependence of Fractal Dimension of Percolation Cluster on...Network-Growth Rule Dependence of Fractal Dimension of Percolation Cluster on...
Network-Growth Rule Dependence of Fractal Dimension of Percolation Cluster on...
 
Machine Learning and Logging for Monitoring Microservices
Machine Learning and Logging for Monitoring Microservices Machine Learning and Logging for Monitoring Microservices
Machine Learning and Logging for Monitoring Microservices
 
Scalable and Reliable Logging at Pinterest
Scalable and Reliable Logging at PinterestScalable and Reliable Logging at Pinterest
Scalable and Reliable Logging at Pinterest
 
Percolation
PercolationPercolation
Percolation
 
Interlayer-Interaction Dependence of Latent Heat in the Heisenberg Model on a...
Interlayer-Interaction Dependence of Latent Heat in the Heisenberg Model on a...Interlayer-Interaction Dependence of Latent Heat in the Heisenberg Model on a...
Interlayer-Interaction Dependence of Latent Heat in the Heisenberg Model on a...
 
Engineering Intelligent NLP Applications Using Deep Learning – Part 1
Engineering Intelligent NLP Applications Using Deep Learning – Part 1Engineering Intelligent NLP Applications Using Deep Learning – Part 1
Engineering Intelligent NLP Applications Using Deep Learning – Part 1
 
Predictive analytics in mobility
Predictive analytics in mobilityPredictive analytics in mobility
Predictive analytics in mobility
 
BigData & Supply Chain: A "Small" Introduction
BigData & Supply Chain: A "Small" IntroductionBigData & Supply Chain: A "Small" Introduction
BigData & Supply Chain: A "Small" Introduction
 
Logging : How much is too much? Network Security Monitoring Talk @ hasgeek
Logging : How much is too much? Network Security Monitoring Talk @ hasgeekLogging : How much is too much? Network Security Monitoring Talk @ hasgeek
Logging : How much is too much? Network Security Monitoring Talk @ hasgeek
 

Ähnlich wie 20131011 - Los Gatos - Netflix - Big Data Design Patterns

20131111 - Santa Monica - BigDataCamp - Big Data Design Patterns
20131111 - Santa Monica - BigDataCamp - Big Data Design Patterns20131111 - Santa Monica - BigDataCamp - Big Data Design Patterns
20131111 - Santa Monica - BigDataCamp - Big Data Design PatternsAllen Day, PhD
 
2013.12.12 - Sydney - Big Data Analytics
2013.12.12 - Sydney - Big Data Analytics2013.12.12 - Sydney - Big Data Analytics
2013.12.12 - Sydney - Big Data AnalyticsAllen Day, PhD
 
Innovating With Data and Analytics
Innovating With Data and AnalyticsInnovating With Data and Analytics
Innovating With Data and AnalyticsVMware Tanzu
 
Real Time Business Platform by Ivan Novick from Pivotal
Real Time Business Platform by Ivan Novick from PivotalReal Time Business Platform by Ivan Novick from Pivotal
Real Time Business Platform by Ivan Novick from PivotalVMware Tanzu Korea
 
Advanced Analytics and Machine Learning with Data Virtualization (India)
Advanced Analytics and Machine Learning with Data Virtualization (India)Advanced Analytics and Machine Learning with Data Virtualization (India)
Advanced Analytics and Machine Learning with Data Virtualization (India)Denodo
 
Real-Time With AI – The Convergence Of Big Data And AI by Colin MacNaughton
Real-Time With AI – The Convergence Of Big Data And AI by Colin MacNaughtonReal-Time With AI – The Convergence Of Big Data And AI by Colin MacNaughton
Real-Time With AI – The Convergence Of Big Data And AI by Colin MacNaughtonSynerzip
 
Geospatial Intelligence Middle East 2013_Big Data_Steven Ramage
Geospatial Intelligence Middle East 2013_Big Data_Steven RamageGeospatial Intelligence Middle East 2013_Big Data_Steven Ramage
Geospatial Intelligence Middle East 2013_Big Data_Steven RamageSteven Ramage
 
Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...
Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...
Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...Cloudera, Inc.
 
Data Science as a Commodity: Use MADlib, R, & other OSS Tools for Data Scienc...
Data Science as a Commodity: Use MADlib, R, & other OSS Tools for Data Scienc...Data Science as a Commodity: Use MADlib, R, & other OSS Tools for Data Scienc...
Data Science as a Commodity: Use MADlib, R, & other OSS Tools for Data Scienc...Sarah Aerni
 
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...Denodo
 
R for SAS Users Complement or Replace Two Strategies
R for SAS Users Complement or Replace Two StrategiesR for SAS Users Complement or Replace Two Strategies
R for SAS Users Complement or Replace Two StrategiesRevolution Analytics
 
Scaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data ScienceScaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data ScienceeRic Choo
 
Future of Data Strategy (ASEAN)
Future of Data Strategy (ASEAN)Future of Data Strategy (ASEAN)
Future of Data Strategy (ASEAN)Denodo
 
How to Use Big Data to Transform IT Operations
How to Use Big Data to Transform IT OperationsHow to Use Big Data to Transform IT Operations
How to Use Big Data to Transform IT OperationsExtraHop Networks
 
"Demystifying Big Data by AIBDP.org
"Demystifying Big Data by AIBDP.org"Demystifying Big Data by AIBDP.org
"Demystifying Big Data by AIBDP.orgAIBDP
 
The Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data HubThe Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data HubCloudera, Inc.
 
oracleadvancedanalyticsv2otn-2859525.pptx
oracleadvancedanalyticsv2otn-2859525.pptxoracleadvancedanalyticsv2otn-2859525.pptx
oracleadvancedanalyticsv2otn-2859525.pptxAdityaDas899782
 
Genome Analysis Pipelines, Big Data Style
Genome Analysis Pipelines, Big Data StyleGenome Analysis Pipelines, Big Data Style
Genome Analysis Pipelines, Big Data StyleJulius Remigio, CBIP
 
Big Data Session 1.pptx
Big Data Session 1.pptxBig Data Session 1.pptx
Big Data Session 1.pptxElsonPaul2
 

Ähnlich wie 20131011 - Los Gatos - Netflix - Big Data Design Patterns (20)

20131111 - Santa Monica - BigDataCamp - Big Data Design Patterns
20131111 - Santa Monica - BigDataCamp - Big Data Design Patterns20131111 - Santa Monica - BigDataCamp - Big Data Design Patterns
20131111 - Santa Monica - BigDataCamp - Big Data Design Patterns
 
2013.12.12 - Sydney - Big Data Analytics
2013.12.12 - Sydney - Big Data Analytics2013.12.12 - Sydney - Big Data Analytics
2013.12.12 - Sydney - Big Data Analytics
 
Innovating With Data and Analytics
Innovating With Data and AnalyticsInnovating With Data and Analytics
Innovating With Data and Analytics
 
Real Time Business Platform by Ivan Novick from Pivotal
Real Time Business Platform by Ivan Novick from PivotalReal Time Business Platform by Ivan Novick from Pivotal
Real Time Business Platform by Ivan Novick from Pivotal
 
Advanced Analytics and Machine Learning with Data Virtualization (India)
Advanced Analytics and Machine Learning with Data Virtualization (India)Advanced Analytics and Machine Learning with Data Virtualization (India)
Advanced Analytics and Machine Learning with Data Virtualization (India)
 
Real-Time With AI – The Convergence Of Big Data And AI by Colin MacNaughton
Real-Time With AI – The Convergence Of Big Data And AI by Colin MacNaughtonReal-Time With AI – The Convergence Of Big Data And AI by Colin MacNaughton
Real-Time With AI – The Convergence Of Big Data And AI by Colin MacNaughton
 
Geospatial Intelligence Middle East 2013_Big Data_Steven Ramage
Geospatial Intelligence Middle East 2013_Big Data_Steven RamageGeospatial Intelligence Middle East 2013_Big Data_Steven Ramage
Geospatial Intelligence Middle East 2013_Big Data_Steven Ramage
 
Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...
Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...
Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...
 
Data Science as a Commodity: Use MADlib, R, & other OSS Tools for Data Scienc...
Data Science as a Commodity: Use MADlib, R, & other OSS Tools for Data Scienc...Data Science as a Commodity: Use MADlib, R, & other OSS Tools for Data Scienc...
Data Science as a Commodity: Use MADlib, R, & other OSS Tools for Data Scienc...
 
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
 
R for SAS Users Complement or Replace Two Strategies
R for SAS Users Complement or Replace Two StrategiesR for SAS Users Complement or Replace Two Strategies
R for SAS Users Complement or Replace Two Strategies
 
Scaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data ScienceScaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data Science
 
Future of Data Strategy (ASEAN)
Future of Data Strategy (ASEAN)Future of Data Strategy (ASEAN)
Future of Data Strategy (ASEAN)
 
How to Use Big Data to Transform IT Operations
How to Use Big Data to Transform IT OperationsHow to Use Big Data to Transform IT Operations
How to Use Big Data to Transform IT Operations
 
"Demystifying Big Data by AIBDP.org
"Demystifying Big Data by AIBDP.org"Demystifying Big Data by AIBDP.org
"Demystifying Big Data by AIBDP.org
 
The Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data HubThe Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data Hub
 
oracleadvancedanalyticsv2otn-2859525.pptx
oracleadvancedanalyticsv2otn-2859525.pptxoracleadvancedanalyticsv2otn-2859525.pptx
oracleadvancedanalyticsv2otn-2859525.pptx
 
Data mining with big data
Data mining with big dataData mining with big data
Data mining with big data
 
Genome Analysis Pipelines, Big Data Style
Genome Analysis Pipelines, Big Data StyleGenome Analysis Pipelines, Big Data Style
Genome Analysis Pipelines, Big Data Style
 
Big Data Session 1.pptx
Big Data Session 1.pptxBig Data Session 1.pptx
Big Data Session 1.pptx
 

Mehr von Allen Day, PhD

Deep learning in medicine: An introduction and applications to next-generatio...
Deep learning in medicine: An introduction and applications to next-generatio...Deep learning in medicine: An introduction and applications to next-generatio...
Deep learning in medicine: An introduction and applications to next-generatio...Allen Day, PhD
 
20170428 - Look to Precision Agriculture to Bootstrap Precision Medicine - Cu...
20170428 - Look to Precision Agriculture to Bootstrap Precision Medicine - Cu...20170428 - Look to Precision Agriculture to Bootstrap Precision Medicine - Cu...
20170428 - Look to Precision Agriculture to Bootstrap Precision Medicine - Cu...Allen Day, PhD
 
20170426 - Deep Learning Applications in Genomics - Vancouver - Simon Fraser ...
20170426 - Deep Learning Applications in Genomics - Vancouver - Simon Fraser ...20170426 - Deep Learning Applications in Genomics - Vancouver - Simon Fraser ...
20170426 - Deep Learning Applications in Genomics - Vancouver - Simon Fraser ...Allen Day, PhD
 
20170424 - Big Data in Biology - Vancouver - Simon Fraser University
20170424 - Big Data in Biology - Vancouver - Simon Fraser University20170424 - Big Data in Biology - Vancouver - Simon Fraser University
20170424 - Big Data in Biology - Vancouver - Simon Fraser UniversityAllen Day, PhD
 
20170406 Genomics@Google - KeyGene - Wageningen
20170406 Genomics@Google - KeyGene - Wageningen20170406 Genomics@Google - KeyGene - Wageningen
20170406 Genomics@Google - KeyGene - WageningenAllen Day, PhD
 
20170402 Crop Innovation and Business - Amsterdam
20170402 Crop Innovation and Business - Amsterdam20170402 Crop Innovation and Business - Amsterdam
20170402 Crop Innovation and Business - AmsterdamAllen Day, PhD
 
20170315 Cloud Accelerated Genomics - Tel Aviv / Phoenix
20170315 Cloud Accelerated Genomics - Tel Aviv / Phoenix20170315 Cloud Accelerated Genomics - Tel Aviv / Phoenix
20170315 Cloud Accelerated Genomics - Tel Aviv / PhoenixAllen Day, PhD
 
Genome Analysis Pipelines with Spark and ADAM
Genome Analysis Pipelines with Spark and ADAMGenome Analysis Pipelines with Spark and ADAM
Genome Analysis Pipelines with Spark and ADAMAllen Day, PhD
 
Hadoop and Genomics - What you need to know - 2015.04.09 - Shenzhen - BGI
Hadoop and Genomics - What you need to know - 2015.04.09 - Shenzhen - BGIHadoop and Genomics - What you need to know - 2015.04.09 - Shenzhen - BGI
Hadoop and Genomics - What you need to know - 2015.04.09 - Shenzhen - BGIAllen Day, PhD
 
Hadoop and Genomics - What you need to know - Cambridge - Sanger Center and EBI
Hadoop and Genomics - What you need to know - Cambridge - Sanger Center and EBIHadoop and Genomics - What you need to know - Cambridge - Sanger Center and EBI
Hadoop and Genomics - What you need to know - Cambridge - Sanger Center and EBIAllen Day, PhD
 
Hadoop and Genomics - What You Need to Know - London - Viadex RCC - 2015.03.17
Hadoop and Genomics - What You Need to Know - London - Viadex RCC - 2015.03.17Hadoop and Genomics - What You Need to Know - London - Viadex RCC - 2015.03.17
Hadoop and Genomics - What You Need to Know - London - Viadex RCC - 2015.03.17Allen Day, PhD
 
Hadoop as a Platform for Genomics - Strata 2015, San Jose
Hadoop as a Platform for Genomics - Strata 2015, San JoseHadoop as a Platform for Genomics - Strata 2015, San Jose
Hadoop as a Platform for Genomics - Strata 2015, San JoseAllen Day, PhD
 
Genomics isn't Special
Genomics isn't SpecialGenomics isn't Special
Genomics isn't SpecialAllen Day, PhD
 
Renaissance in Medicine - Strata - NoSQL and Genomics
Renaissance in Medicine - Strata - NoSQL and GenomicsRenaissance in Medicine - Strata - NoSQL and Genomics
Renaissance in Medicine - Strata - NoSQL and GenomicsAllen Day, PhD
 
2014.06.16 - BGI - Genomics BigData Workloads - Shenzhen China
2014.06.16 - BGI - Genomics BigData Workloads - Shenzhen China2014.06.16 - BGI - Genomics BigData Workloads - Shenzhen China
2014.06.16 - BGI - Genomics BigData Workloads - Shenzhen ChinaAllen Day, PhD
 
2014.06.30 - Renaissance in Medicine - Singapore Management University - Data...
2014.06.30 - Renaissance in Medicine - Singapore Management University - Data...2014.06.30 - Renaissance in Medicine - Singapore Management University - Data...
2014.06.30 - Renaissance in Medicine - Singapore Management University - Data...Allen Day, PhD
 
R + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San Jose
R + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San JoseR + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San Jose
R + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San JoseAllen Day, PhD
 
Human Genetics & Big Data [sans Ethics]
Human Genetics & Big Data [sans Ethics]Human Genetics & Big Data [sans Ethics]
Human Genetics & Big Data [sans Ethics]Allen Day, PhD
 
Building Data Science Teams, Abbreviated
Building Data Science Teams, AbbreviatedBuilding Data Science Teams, Abbreviated
Building Data Science Teams, AbbreviatedAllen Day, PhD
 
Genomics Crash Course for Data Engineers
Genomics Crash Course for Data EngineersGenomics Crash Course for Data Engineers
Genomics Crash Course for Data EngineersAllen Day, PhD
 

Mehr von Allen Day, PhD (20)

Deep learning in medicine: An introduction and applications to next-generatio...
Deep learning in medicine: An introduction and applications to next-generatio...Deep learning in medicine: An introduction and applications to next-generatio...
Deep learning in medicine: An introduction and applications to next-generatio...
 
20170428 - Look to Precision Agriculture to Bootstrap Precision Medicine - Cu...
20170428 - Look to Precision Agriculture to Bootstrap Precision Medicine - Cu...20170428 - Look to Precision Agriculture to Bootstrap Precision Medicine - Cu...
20170428 - Look to Precision Agriculture to Bootstrap Precision Medicine - Cu...
 
20170426 - Deep Learning Applications in Genomics - Vancouver - Simon Fraser ...
20170426 - Deep Learning Applications in Genomics - Vancouver - Simon Fraser ...20170426 - Deep Learning Applications in Genomics - Vancouver - Simon Fraser ...
20170426 - Deep Learning Applications in Genomics - Vancouver - Simon Fraser ...
 
20170424 - Big Data in Biology - Vancouver - Simon Fraser University
20170424 - Big Data in Biology - Vancouver - Simon Fraser University20170424 - Big Data in Biology - Vancouver - Simon Fraser University
20170424 - Big Data in Biology - Vancouver - Simon Fraser University
 
20170406 Genomics@Google - KeyGene - Wageningen
20170406 Genomics@Google - KeyGene - Wageningen20170406 Genomics@Google - KeyGene - Wageningen
20170406 Genomics@Google - KeyGene - Wageningen
 
20170402 Crop Innovation and Business - Amsterdam
20170402 Crop Innovation and Business - Amsterdam20170402 Crop Innovation and Business - Amsterdam
20170402 Crop Innovation and Business - Amsterdam
 
20170315 Cloud Accelerated Genomics - Tel Aviv / Phoenix
20170315 Cloud Accelerated Genomics - Tel Aviv / Phoenix20170315 Cloud Accelerated Genomics - Tel Aviv / Phoenix
20170315 Cloud Accelerated Genomics - Tel Aviv / Phoenix
 
Genome Analysis Pipelines with Spark and ADAM
Genome Analysis Pipelines with Spark and ADAMGenome Analysis Pipelines with Spark and ADAM
Genome Analysis Pipelines with Spark and ADAM
 
Hadoop and Genomics - What you need to know - 2015.04.09 - Shenzhen - BGI
Hadoop and Genomics - What you need to know - 2015.04.09 - Shenzhen - BGIHadoop and Genomics - What you need to know - 2015.04.09 - Shenzhen - BGI
Hadoop and Genomics - What you need to know - 2015.04.09 - Shenzhen - BGI
 
Hadoop and Genomics - What you need to know - Cambridge - Sanger Center and EBI
Hadoop and Genomics - What you need to know - Cambridge - Sanger Center and EBIHadoop and Genomics - What you need to know - Cambridge - Sanger Center and EBI
Hadoop and Genomics - What you need to know - Cambridge - Sanger Center and EBI
 
Hadoop and Genomics - What You Need to Know - London - Viadex RCC - 2015.03.17
Hadoop and Genomics - What You Need to Know - London - Viadex RCC - 2015.03.17Hadoop and Genomics - What You Need to Know - London - Viadex RCC - 2015.03.17
Hadoop and Genomics - What You Need to Know - London - Viadex RCC - 2015.03.17
 
Hadoop as a Platform for Genomics - Strata 2015, San Jose
Hadoop as a Platform for Genomics - Strata 2015, San JoseHadoop as a Platform for Genomics - Strata 2015, San Jose
Hadoop as a Platform for Genomics - Strata 2015, San Jose
 
Genomics isn't Special
Genomics isn't SpecialGenomics isn't Special
Genomics isn't Special
 
Renaissance in Medicine - Strata - NoSQL and Genomics
Renaissance in Medicine - Strata - NoSQL and GenomicsRenaissance in Medicine - Strata - NoSQL and Genomics
Renaissance in Medicine - Strata - NoSQL and Genomics
 
2014.06.16 - BGI - Genomics BigData Workloads - Shenzhen China
2014.06.16 - BGI - Genomics BigData Workloads - Shenzhen China2014.06.16 - BGI - Genomics BigData Workloads - Shenzhen China
2014.06.16 - BGI - Genomics BigData Workloads - Shenzhen China
 
2014.06.30 - Renaissance in Medicine - Singapore Management University - Data...
2014.06.30 - Renaissance in Medicine - Singapore Management University - Data...2014.06.30 - Renaissance in Medicine - Singapore Management University - Data...
2014.06.30 - Renaissance in Medicine - Singapore Management University - Data...
 
R + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San Jose
R + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San JoseR + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San Jose
R + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San Jose
 
Human Genetics & Big Data [sans Ethics]
Human Genetics & Big Data [sans Ethics]Human Genetics & Big Data [sans Ethics]
Human Genetics & Big Data [sans Ethics]
 
Building Data Science Teams, Abbreviated
Building Data Science Teams, AbbreviatedBuilding Data Science Teams, Abbreviated
Building Data Science Teams, Abbreviated
 
Genomics Crash Course for Data Engineers
Genomics Crash Course for Data EngineersGenomics Crash Course for Data Engineers
Genomics Crash Course for Data Engineers
 

KĂŒrzlich hochgeladen

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍾 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍾 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍾 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍾 8923113531 🎰 Avail...gurkirankumar98700
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 

KĂŒrzlich hochgeladen (20)

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍾 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍾 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍾 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍾 8923113531 🎰 Avail...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 

20131011 - Los Gatos - Netflix - Big Data Design Patterns

  • 1. Design Patterns for Big Data Architecture: Best Strategies for Streamlined [Simple, Powerful] Design Allen Day, PhD Data Scientist, MapR Technologies October 2013 ©MapR Technologies - Confidential
  • 2. Me, Us ‱ Allen Day, Principal Data Scientist, MapR R contributor (10 yr), Hadoop (6 yr) Human Genetics (UCLA Medicine), Machine Learning ‱ MapR Distributes open source components for Hadoop Adds major enhancements for performance, high-availability, and ease-of-use ‱ See Also – “allenday” most places (twitter, github, etc.) – aday@maprtech.com, allenday@allenday.com – @mapR ©MapR Technologies - Confidential
  • 3. Three Business Use Cases Personalized Search ©MapR Technologies - Confidential Personalized Medicine Market Segmentation
  • 4. Three Business Use Cases Personalized Search Personalized Medicine ‱ Public web index + personal search history ‱ Custom ranking of results ‱ Patient medical history ‱ Genomic info. ‱ Match against database of therapies ©MapR Technologies - Confidential Market Segmentation ‱ Group similar customers ‱ Target with cross-sell / upsell campaign
  • 5. Three Business Use Cases Personalized Search Personalized Medicine ‱ Public web index + personal search history ‱ Custom ranking of results ‱ Patient medical history ‱ Genomic info. ‱ Match against database of therapies Personal data Personal data Which ones are similar? ©MapR Technologies - Confidential Market Segmentation ‱ Group similar customers ‱ Target with cross-sell / upsell campaign Marketing
  • 6. Three Business Use Cases Personalized Search Personalized Medicine ‱ Public web index + personal search history ‱ Custom ranking of results ‱ Patient medical history ‱ Genomic info. ‱ Match against database of therapies Personal data Personal data Which ones are similar? ©MapR Technologies - Confidential Market Segmentation ‱ Group similar customers ‱ Target with cross-sell / upsell campaign Marketing
  • 7. Three Business Use Cases Personalized Search Personalized Medicine ‱ Public web index + personal search history ‱ Custom ranking of results ‱ Patient medical history ‱ Genomic info. ‱ Match against database of therapies Personal data How can you tell? Personal data ©MapR Technologies - Confidential Market Segmentation ‱ Group similar customers ‱ Target with cross-sell / upsell campaign Marketing
  • 8. But First
 WHAT IS A DESIGN PATTERN? ©MapR Technologies - Confidential
  • 9. “a design pattern is a general reusable solution to a commonly occurring problem within a given context in software design. A design pattern is not a finished design that can be transformed directly into source or machine code. It is a description or template for how to solve a problem that can be used in many different situations” http://en.wikipedia.org/wiki/Software_design_pattern ©MapR Technologies - Confidential
  • 10. History of Design Pattern Ideation 1977 Architecture & Civil Engineering ©MapR Technologies - Confidential 1994 OO Software Architecture 2012 Parallelization Software ? Application Parallelization
  • 12. Big Data Application Shapes 1. How big is your input record? 2. How big is the data that is relevant to processing the input record? 3. How big is the total data that could be relevant to processing the input? 4. How fast do inputs flow in? 5. How fast do outputs need to flow out? 6. How complex (unstructured) are 1-5? 7. How predictable are 1-6? (spikiness, variance) 8. Is accuracy more important than speed? 9. Does the processing contain cycles (feedback loops)? ©MapR Technologies - Confidential
  • 13. Big Data Application Shapes 1. How big is your input record? 2. How big is the data that is relevant to processing the input record? 3. How big is the total data that could be relevant to processing the input? 4. How fast do inputs flow in? 5. How fast do outputs need to flow out? 6. How complex (unstructured) are 1-5? 7. How predictable are 1-6? (spikiness, variance) 8. Is accuracy more important than speed? 9. Does the processing contain cycles (feedback loops)? Volume Velocity Variety ©MapR Technologies - Confidential
  • 14. Big Data Application Shapes 1. How big is your input record? 2. How big is the data that is relevant to processing the input record? 3. How big is the total data that could be relevant to processing the input? 4. How fast do inputs flow in? 5. How fast do outputs need to flow out? 6. How complex (unstructured) are 1-5? 7. How predictable are 1-6? (spikiness, variance) 8. Is accuracy more important than speed? 9. Does the processing contain cycles (feedback loops)? ©MapR Technologies - Confidential
  • 15. Big Data Application Shapes 1. How big is your input record? 2. How big is the data that is relevant to processing the input record? 3. How big is the total data that could be relevant to processing the input? 4. How fast do inputs flow in? 5. How fast do outputs need to flow out? 6. How complex (unstructured) are 1-5? 7. How predictable are 1-6? (spikiness, variance) 8. Is accuracy more important than speed? 9. Does the processing contain cycles (feedback loops)? ©MapR Technologies - Confidential
  • 16. Choose a Pattern: Volume & Velocity 1. How big is your target data? <10 GB mid ? ? A Single element at a time >200 GB 2. How big is your query data? One pass over 100% B C Big storage Streaming Multiple passes over big chunks 3. How fast do you need a result? Throughput > response D ©MapR Technologies - Confidential Nearline Analytics < 100s (human scale) E Exploratory Analysis
  • 17. Twitter Zeitgeist as a Composite of Design Patterns Live data source e.g. Twitter Firehose B C Big storage Streaming D ©MapR Technologies - Confidential Nearline Analytics Downstream applications
  • 18. Big Data Application Shapes 1. How big is your input record? 2. How big is the data that is relevant to processing the input record? 3. How big is the total data that could be relevant to processing the input? 4. How fast do inputs flow in? 5. How fast do outputs need to flow out? 6. How complex (unstructured) are 1-5? 7. How predictable are 1-6? (spikiness, variance) 8. Is accuracy more important than speed? 9. Does the processing contain cycles (feedback loops)? Volume Velocity Variety ©MapR Technologies - Confidential
  • 19. Big Data Application Shapes 1. How big is your input record? 2. How big is the data that is relevant to processing the input record? 3. How big is the total data that could be relevant to processing the input? 4. How fast do inputs flow in? 5. How fast do outputs need to flow out? 6. How complex (unstructured) are 1-5? 7. How predictable are 1-6? (spikiness, variance) 8. Is accuracy more important than speed? 9. Does the processing contain cycles (feedback loops)? Volume Velocity Variety Intents & Methods ©MapR Technologies - Confidential
  • 20. Application characteristic Personalized Search Personalized Medicine Market Segmenting Input record size Co-processed data size Archive size Small Large Large Large Large Small Small Large Large Input rate Output rate Process complexity Input/process spikiness Speed or accuracy? Cycles? Fast Fast High Low Speed Yes Fast Slow High Low Accuracy No Fast Fast Low High Speed Yes ©MapR Technologies - Confidential
  • 21. Percolation in Classic Form Real-time data source Real-time insertion Data store OfïŹ‚ine percolation of recent data Large-scale Incremental Processing Using Distributed Transactions and NotiïŹcations http://research.google.com/pubs/pub36726.html ©MapR Technologies - Confidential
  • 22. Percolation in Classic Form Real-time data source Data store OfïŹ‚ine percolation of recent data Queue Data store Real-time insertion Queued data are unavailable for action – not percolation ©MapR Technologies - Confidential Real-time insertion Delayed insertion
  • 23. Percolation in Classic Form Real-time data source Real-time insertion ©MapR Technologies - Confidential Data store OfïŹ‚ine percolation of recent data
  • 24. Percolation of a Composite Store Real-time data source Real-time insertion Data store OfïŹ‚ine percolation Index Both parts visible ©MapR Technologies - Confidential
  • 25. Market Segmentation ‱ Divide customers into subsets with common needs ‱ Design specific strategies for each subset ‱ Major emphasis on “fresh” data ©MapR Technologies - Confidential
  • 26. Market Segmentation Feature Extraction Real-time transactions Customer history Assign Segment (search) db Market Segments What does this have to do with percolation? ©MapR Technologies - Confidential query Clustering
  • 27. Percolator 1 Feature Extraction Real-time transactions Customer history ©MapR Technologies - Confidential Feature extraction is percolation because it is triggered by the arrival of a new record and because it updates that new record.
  • 28. Percolator 2 Real-time transactions Customer history Market segment assignment is percolation because it is triggered by the arrival of a new record and because only that record's segment is updated. What about the clustering step? ©MapR Technologies - Confidential Assign Segment (search) db Market Segments query
  • 29. Scheduled Update - Not Percolation Customer history Clustering The clustering loop is not percolation since it runs at ïŹxed intervals instead of incrementally as updates are received. It also doesn't update just a single customer record. ©MapR Technologies - Confidential Market Segments
  • 30. Personalized Search ‱ Observe web users’ activity over an extended period ‱ Understand individual user interests ‱ Customize search results for each user ‱ 
as fast as possible ©MapR Technologies - Confidential
  • 31. Personal Search History and Web Index Search Persona Activity db query Persona update Histories trigger query Search Web Crawl feature extraction Doc Store ©MapR Technologies - Confidential db update trigger Doc Index Persona Index
  • 32. Percolator 1 Expensive feature extraction does not block document ingest Web Crawl feature extraction Doc Store ©MapR Technologies - Confidential
  • 33. Percolators 2 and 3 Persona Activity Persona update Histories Web Crawl Doc Store ©MapR Technologies - Confidential update Doc Index Persona Index
  • 34. Percolator 4 Updates to personas trigger updates in related personas Search Persona Activity db query Persona update Histories ©MapR Technologies - Confidential Persona Index
  • 35. Percolator 5? Persona Index Persona Histories trigger query Search db trigger Doc Index ©MapR Technologies - Confidential Persona and doc index updates trigger a personalization refresh
  • 37. Cyclic Dependency Graph ©MapR Technologies - Confidential
  • 38. Percolator Thoughts ‱ M7 tables are great as the first persistence point in percolation ‱ In-memory flag column family works great for triggering updates – Efficient - eliminates need for queuing – Fast triggering with row & column Bloom filters ‱ Percolation is best supported by dedicated column families – Percolators I/O characteristics differ – M7 works especially well because it supports lots of column families ©MapR Technologies - Confidential
  • 39. Cyclic Dependency Graph, M7 Schema ©MapR Technologies - Confidential
  • 40. Personalized Medicine 5. Interpretation & Follow-up 4. Reporting 1. Select Tests 2. Draw Biosample 3. Genome Sequencing & Analysis ©MapR Technologies - Confidential
  • 41. Personalized Medicine Applications ‱ Pre-conception screening ‱ Clinical research & trials – Drug re-targeting ‱ Therapeutics – Companion diagnostics – Therapy selection ©MapR Technologies - Confidential
  • 42. Personalized Medicine Patient history (EHR) EHR archive Insert (eventually) db Sequence extraction Patient health context query Search Ranked therapies Genome Sample Here we do not see real-time data pushed to a persistence layer and processed offline. This pattern does not fit with percolation
 ©MapR Technologies - Confidential
  • 43. Personalized Medicine Patient history (EHR) EHR archive Insert (eventually) db Sequence extraction Genome Sample ©MapR Technologies - Confidential Patient health context query Search User-based recommendation pattern Ranked therapies
  • 44. Recommendation in Classic Form Queue History Archive db Recent history ©MapR Technologies - Confidential query User Search Ranked similar histories
  • 45. Item-Based Recommendation in Classic Form Queue History archive Cooccurrence analysis Off-line analysis Recent history query Item linkage db Search ©MapR Technologies - Confidential Interactive recommendation Ranked items
  • 46. Recommendation Thoughts ‱ Item-based recommendation is for efficiency – expensive step in computing co-occurrence can be done offline and cached prior to a user query ‱ User-based recommendation is for accuracy – user comparisons are done online to find the current best recommendation ‱ MapR is great for recommendation – M7 tables are high I/O performance, can eliminate queues – Faster archive updates with optimized MapReduce – High-availability for mission LIFE critical applications ©MapR Technologies - Confidential
  • 47. Business Use Cases & Design Patterns Recommender – Personalized Medicine Pattern X – Health data Percolator – Personalized Search Percolator – Other Industry Percolator – Personalized Medicine Pattern X – Other Industry ©MapR Technologies - Confidential
  • 48. Summary: Best Practices ‱ Look at the big picture – Find recurring patterns ‱ Design systems at a high-level – Solve problems once and reuse components – Increase R&D productivity – Decrease operational and maintenance overhead ©MapR Technologies - Confidential
  • 49. Thank You! Allen Day, PhD Principal Data Scientist, MapR Technologies aday@maprtech.com, allenday@allenday.com @allenday, @mapr ©MapR Technologies - Confidential

Hinweis der Redaktion

  1. Shapes too big; overwhelmI would describe three projects by short name; then add three distinct shapes, making two hearts since both healthcare; start with all line drawings; two distracting to be color
  2. Shapes too big; overwhelmI would describe three projects by short name; then add three distinct shapes, making two hearts since both healthcare; start with all line drawings; two distracting to be color
  3. Shapes too big; overwhelmI would describe three projects by short name; then add three distinct shapes, making two hearts since both healthcare; start with all line drawings; two distracting to be color
  4. Shapes too big; overwhelmI would describe three projects by short name; then add three distinct shapes, making two hearts since both healthcare; start with all line drawings; two distracting to be color
  5. Shapes too big; overwhelmI would describe three projects by short name; then add three distinct shapes, making two hearts since both healthcare; start with all line drawings; two distracting to be color
  6. Talk track: Both genotyping and market segmentation solutions have a useful design component known as percolation. The key idea is that there is a fast push to store data and an offline processing step that modifies data. The modified data could go back to the same data store or
.Speaker: you might note that we show real-time steps in red; and non-real time steps in black.
  7. Talk track: Both genotyping and market segmentation solutions have a useful design component known as percolation. The key idea is that there is a fast push to store data and an offline processing step that modifies data. The modified data could go back to the same data store or
.Speaker: you might note that we show real-time steps in red; and non-real time steps in black.
  8. Talk track: Both genotyping and market segmentation solutions have a useful design component known as percolation. The key idea is that there is a fast push to store data and an offline processing step that modifies data. The modified data could go back to the same data store or
.Speaker: you might note that we show real-time steps in red; and non-real time steps in black.
  9. Talk track: Both genotyping and market segmentation solutions have a useful design component known as percolation. The key idea is that there is a fast push to store data and an offline processing step that modifies data. The modified data could go back to the same data store or
.Speaker: you might note that we show real-time steps in red; and non-real time steps in black.
  10. Talk track: In market segmentation, you want to identify useful segments of your customer base to target for a market campaign, for retention, for specific product offerings, etc. What makes “good” segments depends on what you want to do and how the environment changes. You may not know ahead of time what categories make useful segments. One way to find this is to capture customer histories and do a clustering step for discovery and definition of the market segments.This market segment db is then queried and updated in response to new real-time data insertion or new rounds of clustering. Specific feature extraction may also be a useful step from the customer history persistence layer.
  11. Talk track: the feature extraction step could be triggered by real-time data insertion

  12. Talk track: a second percolator processes new customer histories relative to the market segments.
  13. Talk track: the clustering step is not triggered by the real-time insertion; it is a scheduled step and thus not an example of percolation.What about the other use case we said was similar, the Genotyping?
  14. Here, we trigger updates to the persona index based on EITHERUpdates to persona history, ORUpdates to the document indexThe idea here being that if enough docs have changed or personas are finding “unusual” stuff, the persona is stale and we should recompute it
  15. Talk track: MapR advantages include the smooth use of HBase on a MapR cluster for the persistence layer at the insertion point, or even better, the use of MapR M7 tables instead. There are two specific advantages to M7 (besides the all-important reliability):a)Less risk of delays/ IO storms etc that can happen with HBase. This is VERY important when pushing real-time data to a data store.b) Strategic advantage of using in-memory flags on column families – very efficient in M7 where you can have lots of column families as opposed to only a few in HBase, operationally speaking.
  16. Best practice: use one column family per percolator to manage their independent i/o characteristicsPrevent i/o storms
  17. Talk track: Now let’s consider the other health data example, genome sequencing for personalized medicine. This is an approach that can be used to get the particular genomic characteristics of a cancerous tumor and compare to known patient histories in order to select the best option for a customized therapy.
  18. Talk track: While percolation is not used in this example, it does represent a specialized form of recommendation: user-based recommendation.In this genome sequencing/ personalized medicine example, A very high bar is set for the accuracy of the recommendation. Here a user-based pattern is best. Let’s look at the generalized form

  19. Talk track: here is the basic pattern for user-based recommendation, as used in the real use case of personalized medicine. In contrast, In consumer recommendation for shopping or movie or music recommendation, rapid response is key and accuracy is slightly less important. There item-based recommendation is generally best, because the expensive step in computing co-occurrence can be done offline prior to a user query.
  20. Talk track: MapR advantages include the smooth use of HBase on a MapR cluster for the persistence layer at the insertion point, or even better, the use of MapR M7 tables instead. There are two specific advantages to M7 (besides the all-important reliability):a)Less risk of delays/ IO storms etc that can happen with HBase. This is VERY important when pushing real-time data to a data store.b) Strategic advantage of using in-memory flags on column families – very efficient in M7 where you can have lots of column families as opposed to only a few in HBase, operationally speaking.