AUA Data Science Meetup

David Gevorkyan
David GevorkyanSenior Software Engineer at Netflix um Netflix
AUA Data Science Meetup
D AV I D G E V O R K YA N
@ d a v i d g e v
d a v i d g e v o r k y a n
G R A D U AT E D A U A I N 2 0 0 8
W H AT I S B I G D ATA ?
FA S H I O N A B L E T E R M ?
8 0 % O F D ATA E X I S T I N G I N A N Y E N T E R P R I S E I S
U N S T R U C T U R E D D ATA
ST R U C T U R E D 	
  
DATA
S E M I -­‐ 	
  
ST R U C T U R E D 	
  
U N ST R U C T U R E D 	
  
DATA
RDBMS Data Warehousing
9 0 % O F T H E D ATA I N T H E W O R L D T O D AY H A S
B E E N C R E AT E D I N T H E L A S T T W O Y E A R S A L O N E
S o u rc e : h t t p : / / w w w. i n t e l . c o m / c o n t e n t / w w w / u s / e n / c o m m u n i c a t i o n s / i n t e r n e t - m i n u t e - i n f o g r a p h i c . h t m l
4 V ’ S O F B I G D ATA
VOLUME (large amount of data)
VARIETY (sensors, video, audio, email, social)
VELOCITY (speed of data generation)
VERACITY (authenticity and/or accuracy)
S O L U T I O N S R E Q U I R E D
f o rc e s y o u t o c h a n g e t h e w a y y o u
• C O L L E C T
• T R A N S P O RT
• S T O R E
• M A N A G E
• A N A LY Z E
• V I S U A L I Z E
AUA Data Science Meetup
W H AT I S D ATA S C I E N C E ?
D ATA S C I E N C E ! = S TAT I S T I C A L A N A LY S I S
I T I S S C I E N C E A N D “ A RT ” O F …
• E X P L O R I N G T H E U N K N O W N A B O U T D ATA
“ m a k e d i s c o v e r i e s w h i l e s w i m m i n g i n t h e d a t a ”
• R E F I N I N G T H E R E S U LT S F O R A C C U R A C Y
• D E R I V I N G A C T I O N A B L E I N S I G H T
• C R E AT I N G D ATA - D R I V E N P R O D U C T S
W H O A R E D ATA S C I E N T I S T S ?
W H O A R E D ATA S C I E N T I S T S ?
D re w C o n w a y, 2 0 1 0
B I G D ATA S C I E N C E T O O L S ?
• S c a l a , J a v a , P y t h o n , R … ( b o n u s : C l o j u re , H a s k e l l , E r l a n g )
• H a d o o p , H D F S , M a p R e d u c e … ( b o n u s : S p a r k , S t o r m , Te z )
• S c a l d i n g , H B a s e , P i g , H i v e … ( b o n u s : S h a r k , T i t a n , G i r a p h )
• F l u m e , S q o o p , E T L , We b s c r a p e r s … ( b o n u s : H u m e )
• S Q L , R D B M S , D W, O L A P… ( b o n u s : S O L R , E l a s t i c S e a rc h )
• K n i m e , We k a , R a p i d M i n e r… ( b o n u s : S c i P y, N u m P y, P a n d a s )
• D 3 . j s , K i b a n a , g g p l o t 2 , Ta b l e u … ( b o n u s : S h i n y, F l a re ,
D a t a m e e r )
• S P S S , M a t l a b , S A S … ( t h e e n t e r p r i s e m a n )
• N o S Q L , M o n g o D B , C a s s a n d r a , C o u c h D B
• A n d Ye s ! … M S - E x c e l : t h e m o s t u s e d , m o s t u n d e r r a t e d D S t o o l
AUA Data Science Meetup
G O A L ?
• R e v e n u e , re v e n u e , re v e n u e
• I m p ro v e t h e c u s t o m e r e x p e r i e n c e
• I n c re a s e o p e r a t i o n a l e ff i c i e n c y
• G E : O p t i m i z e m a i n t e n a n c e i n t e r v a l s f o r i n d u s t r i a l
p ro d u c t s
• G o o g l e : R e f i n e s e a rc h a n d a d - s e r v i n g a l g o r i t h m s
• Z y n g a : O p t i m i z e t h e g a m e e x p e r i e n c e f o r b o t h
l o n g - t e r m e n g a g e m e n t a n d re v e n u e
• N e t f l i x : M o v i e re c o m m e n d a t i o n s
• K a p l a n : U n c o v e r e ff e c t i v e l e a r n i n g s t r a t e g i e s
• e H a r m o n y : C re a t e h a p p y re l a t i o n s h i p s
W H O A R E W E ?
T R A D I T I O N A L M E T H O D S D O N O T W O R K
A N Y M O R E …
E H A R M O N Y C R E AT E S
T H E H A P P I E S T,
M O S T PA S S I O N AT E
A N D M O S T F U L F I L L I N G
R E L AT I O N S H I P S *
* A C C O R D I N G T O A R E C E N T S T U D Y
4 3 8
M A R R I A G E S P E R D AY
T H E D I F F E R E N C E ?
T H E D I F F E R E N C E ?
Compatibility Matching System®
C O M PAT I B I L I T Y
M AT C H I N G
A F F I N I T Y
M AT C H I N G
M AT C H
D I S T R I B U T I O N
T H E D I F F E R E N C E ?
Compatibility Matching System®
C O M PAT I B I L I T Y
M AT C H I N G
A F F I N I T Y
M AT C H I N G
M AT C H
D I S T R I B U T I O N
U N I D I R E C T I O N A L U S E R D E F I N E D C R I T E R I A
Nicolette
U N I D I R E C T I O N A L U S E R D E F I N E D C R I T E R I A
B I D I R E C T I O N A L
Leo
Ian
Steve
Nicolette
U N I D I R E C T I O N A L U S E R D E F I N E D C R I T E R I A
Leo
Ian
Steve
Nicolette
B I D I R E C T I O N A L
AUA Data Science Meetup
AUA Data Science Meetup
AUA Data Science Meetup
150	
  	
  
ques5ons
Personality	
  
Values	
  
A@ributes	
  
Beliefs
Intellect
Energy
Sociability
Ambition
Kindness
Curiosity
Humor
Spirituality
C O M PAT I B I L I T Y M AT C H I N G
U S E R D E F I N E D
C R I T E R I A
C O M PAT I B I L I T Y
M O D E L S
M O N G O D B
V O L D E M O RT
M O N G O D B
DATA STORE NEEDS
P O W E R F U L
I N D E X I N G
M O D E L S
FA S T M U LT I -
AT T R I B U T E
S E A R C H E S
E A S Y T O
M A I N TA I N
6 0 M +
Q U E R I E S
per day
M O N G O D B
WINS
A U T O
S C A L I N G
B U I LT- I N
S H A R D I N G
A U T O
B A L A N C I N G
M M S
V O L D E M O RT ?
T H AT N A M E
S O U N D S FA M I L I A R
V O L D E M O RT
DATA STORE NEEDS
C R U D
O P E R AT I O N S
VA R I E D
T R A N S A C T I O N
S I Z E S
B I L L I O N +
P O T E N T I A L
M AT C H E S
per day
V O L D E M O RT
WINS
A U T O
R E P L I C AT I O N
A U T O
PA RT I T I O N I N G
P L U G G A B L E
S E R I A L I Z AT I O N
A F F I N I T Y M AT C H I N G
Compatibility Matching System®
C O M PAT I B I L I T Y
M AT C H I N G
A F F I N I T Y
M AT C H I N G
M AT C H
D I S T R I B U T I O N
65 30
3000 miles
Commprobability
Distance in Miles
0 1 3 7 15 63 255 1023 4095
P R O B
AUA Data Science Meetup
Commprobability
Height difference in cm
-29 -25 -21 -17 -13 -9 -6 -3 0 3 6 9 12 16 20 24 28 32 36 40 44 48 52 56
4	
  -­‐	
  8	
  in
P R O B
W O R D S T O U S E
W O R D S T O U S E
S O M E I N S I G H T
D ATA N E E D S F O R A F F I N I T Y
5 0 M + R E G I S T E R E D U S E R S
1 0 3
AT T R I B U T E S
1 0 7
D A I LY M AT C H E S
2 5 0 M +
P H O T O S
4 B + Q U E S T I O N N A I R E S
A N S W E R E D
C O M M U N I C AT I O N A G G R E G AT E S
E V E N T L I S T E N E R
S E R V I C E
U S E R A C T I V I T Y
S E R V I C E
~ 5 M S
R E S P O N S E
T I M E S
1 0 K E V E N T S
P E R S E C O N D
U S E R
S E R V I C E
H O U R LY, D A I LY
T O TA L
O F F L I N E B AT C H J O B S
U S E R
S E R V I C E
M A P - S I D E J O I N S
( T B )
S C O R I N G
1+GB	
  Compressed	
  Protocol	
  
Buffers	
  
PA I R I N G S
S E R V I C E
750M	
  Compressed	
  
Protocol	
  Buffers	
  
B I L L I O N +
P O T E N T I A L
M AT C H E S
A M A Z O N
E M R
AW S D I R E C T
C O N N E C T
2 5 6 N O D E S
5 0 T B S T O R A G E
I N - H O U S E
S E A M I C R O
D ATA R E T R I E VA L L AT E N C Y
L O W O P E R AT I O N A L C O S T
L O W P O W E R C O N S U M P T I O N
P R E D I C TA B L E C O M P L E T I O N T I M E S
M O D E L R E T R A I N I N G
distcp
Protocol	
  Buffers	
  from	
  
Offline	
  Jobs	
  
M AT C H D I S T R I B U T I O N
Compatibility Matching System®
C O M PAT I B I L I T Y
M AT C H I N G
A F F I N I T Y
M AT C H I N G
M AT C H
D I S T R I B U T I O N
Delivering the right matches
at the right time to as many
people as possible across
the entire network
AUA Data Science Meetup
AUA Data Science Meetup
AUA Data Science Meetup
AUA Data Science Meetup
AUA Data Science Meetup
AUA Data Science Meetup
T H A N K Y O U
Q U E S T I O N S ?
C R E D I T S :
The Noun Project
http://thenounproject.com
Visual Elements From
1 von 63

Recomendados

In pursuit of messaging broker(s) von
In pursuit of messaging broker(s)In pursuit of messaging broker(s)
In pursuit of messaging broker(s)David Gevorkyan
6.8K views47 Folien
Ninja Correlation of APT Binaries von
Ninja Correlation of APT BinariesNinja Correlation of APT Binaries
Ninja Correlation of APT BinariesCODE BLUE
773 views35 Folien
Manejo de redes von
Manejo de redesManejo de redes
Manejo de redesAndreaGuadalupeAceve
20 views12 Folien
Codecademy Live QA Presentation von
Codecademy Live QA PresentationCodecademy Live QA Presentation
Codecademy Live QA PresentationJames Kim
360 views25 Folien
Upgrading OpenStack? Avoid these 3 Common Pitfalls von
Upgrading OpenStack? Avoid these 3 Common PitfallsUpgrading OpenStack? Avoid these 3 Common Pitfalls
Upgrading OpenStack? Avoid these 3 Common PitfallsPlatform9
464 views20 Folien
ATC UK 2015: Enhancing Drop Testing Simulation for Luxury Smartphones von
ATC UK 2015: Enhancing Drop Testing Simulation for Luxury SmartphonesATC UK 2015: Enhancing Drop Testing Simulation for Luxury Smartphones
ATC UK 2015: Enhancing Drop Testing Simulation for Luxury SmartphonesAltair
359 views16 Folien

Más contenido relacionado

Was ist angesagt?

Hard to Reach Users in Easy to Reach Places von
Hard to Reach Users in Easy to Reach PlacesHard to Reach Users in Easy to Reach Places
Hard to Reach Users in Easy to Reach PlacesMike Crabb
1.1K views47 Folien
Blue Moon - Advertising Plan (Group Project) von
Blue Moon - Advertising Plan (Group Project)Blue Moon - Advertising Plan (Group Project)
Blue Moon - Advertising Plan (Group Project)Sam Cheema
197 views23 Folien
DATA FLOWS & NATIONAL SECURITY von
DATA FLOWS & NATIONAL SECURITYDATA FLOWS & NATIONAL SECURITY
DATA FLOWS & NATIONAL SECURITYMartina F. Ferracane
151 views22 Folien
How GZIP compression works - JS Conf EU 2014 von
How GZIP compression works - JS Conf EU 2014How GZIP compression works - JS Conf EU 2014
How GZIP compression works - JS Conf EU 2014Raul Fraile
4K views46 Folien
Help Ukraine von
Help UkraineHelp Ukraine
Help UkraineNastyaTsaruk
12 views2 Folien
Fashion Guidelines von
Fashion Guidelines Fashion Guidelines
Fashion Guidelines Saad Lemgaddar
186 views23 Folien

Was ist angesagt?(20)

Hard to Reach Users in Easy to Reach Places von Mike Crabb
Hard to Reach Users in Easy to Reach PlacesHard to Reach Users in Easy to Reach Places
Hard to Reach Users in Easy to Reach Places
Mike Crabb1.1K views
Blue Moon - Advertising Plan (Group Project) von Sam Cheema
Blue Moon - Advertising Plan (Group Project)Blue Moon - Advertising Plan (Group Project)
Blue Moon - Advertising Plan (Group Project)
Sam Cheema197 views
How GZIP compression works - JS Conf EU 2014 von Raul Fraile
How GZIP compression works - JS Conf EU 2014How GZIP compression works - JS Conf EU 2014
How GZIP compression works - JS Conf EU 2014
Raul Fraile4K views
SEO: A Crash Course | What is SEO in 2015? An Ethoseo™ Presentation von Damien Wright
SEO: A Crash Course | What is SEO in 2015? An Ethoseo™ PresentationSEO: A Crash Course | What is SEO in 2015? An Ethoseo™ Presentation
SEO: A Crash Course | What is SEO in 2015? An Ethoseo™ Presentation
Damien Wright2.2K views
The Art Of Practicing - WebSummit 2014 von Nikolai Onken
The Art Of Practicing - WebSummit 2014The Art Of Practicing - WebSummit 2014
The Art Of Practicing - WebSummit 2014
Nikolai Onken10.8K views
Test quick, build smart, be awesome von WP&UP
Test quick, build smart, be awesomeTest quick, build smart, be awesome
Test quick, build smart, be awesome
WP&UP76 views
Informing Innovation: Contextual Investigation for Effective Academic Technol... von char booth
Informing Innovation: Contextual Investigation for Effective Academic Technol...Informing Innovation: Contextual Investigation for Effective Academic Technol...
Informing Innovation: Contextual Investigation for Effective Academic Technol...
char booth1.1K views
American Marketing Association - Strategy Presentation von Sam Cheema
American Marketing Association - Strategy Presentation American Marketing Association - Strategy Presentation
American Marketing Association - Strategy Presentation
Sam Cheema284 views
Trends, organisatie impact en social media presentatie von Mirror Wise
Trends, organisatie impact en social media presentatieTrends, organisatie impact en social media presentatie
Trends, organisatie impact en social media presentatie
Mirror Wise250 views

Similar a AUA Data Science Meetup

Interactive media : information and libraries (#bobcatsss2017) von
Interactive media : information and libraries (#bobcatsss2017)Interactive media : information and libraries (#bobcatsss2017)
Interactive media : information and libraries (#bobcatsss2017)Guus van den Brekel
1K views72 Folien
leihdir.de "SMART & LOCAL RENTAL SEARCH ENGINE" Handout for Investores von
leihdir.de "SMART & LOCAL RENTAL SEARCH ENGINE" Handout for Investoresleihdir.de "SMART & LOCAL RENTAL SEARCH ENGINE" Handout for Investores
leihdir.de "SMART & LOCAL RENTAL SEARCH ENGINE" Handout for InvestoresAlireza Rezvani
33 views36 Folien
UX in E-commerce & Conversion von
UX in E-commerce & ConversionUX in E-commerce & Conversion
UX in E-commerce & ConversionElymar Apao
1.2K views52 Folien
Metodologia simulacro von
Metodologia simulacroMetodologia simulacro
Metodologia simulacroMartha Salas
94 views14 Folien
uso de materiales en condiciones asépticas von
uso de materiales en condiciones asépticasuso de materiales en condiciones asépticas
uso de materiales en condiciones asépticasIPN
259 views11 Folien
Orla Recreio - CURY von
Orla Recreio - CURYOrla Recreio - CURY
Orla Recreio - CURYverdadeimovel
1K views33 Folien

Similar a AUA Data Science Meetup(20)

Interactive media : information and libraries (#bobcatsss2017) von Guus van den Brekel
Interactive media : information and libraries (#bobcatsss2017)Interactive media : information and libraries (#bobcatsss2017)
Interactive media : information and libraries (#bobcatsss2017)
leihdir.de "SMART & LOCAL RENTAL SEARCH ENGINE" Handout for Investores von Alireza Rezvani
leihdir.de "SMART & LOCAL RENTAL SEARCH ENGINE" Handout for Investoresleihdir.de "SMART & LOCAL RENTAL SEARCH ENGINE" Handout for Investores
leihdir.de "SMART & LOCAL RENTAL SEARCH ENGINE" Handout for Investores
Alireza Rezvani33 views
UX in E-commerce & Conversion von Elymar Apao
UX in E-commerce & ConversionUX in E-commerce & Conversion
UX in E-commerce & Conversion
Elymar Apao1.2K views
uso de materiales en condiciones asépticas von IPN
uso de materiales en condiciones asépticasuso de materiales en condiciones asépticas
uso de materiales en condiciones asépticas
IPN259 views
SKILLSHIKSHA PPT.pdf Skill Shiksha Master in Data Science Program von apurvaSrivastava49
SKILLSHIKSHA PPT.pdf Skill Shiksha Master in Data Science ProgramSKILLSHIKSHA PPT.pdf Skill Shiksha Master in Data Science Program
SKILLSHIKSHA PPT.pdf Skill Shiksha Master in Data Science Program
Golden Gate - 2, 3 e 4 quartos - Centro Nova Iguaçu von Antonio Neto
 Golden Gate - 2, 3 e 4 quartos - Centro Nova Iguaçu Golden Gate - 2, 3 e 4 quartos - Centro Nova Iguaçu
Golden Gate - 2, 3 e 4 quartos - Centro Nova Iguaçu
Antonio Neto612 views
HRM KVK en social media von Mirror Wise
HRM KVK en social mediaHRM KVK en social media
HRM KVK en social media
Mirror Wise819 views
New technologies about Drugs Administration - Pharmacology von Yvann Saculo
New technologies about Drugs Administration - PharmacologyNew technologies about Drugs Administration - Pharmacology
New technologies about Drugs Administration - Pharmacology
Yvann Saculo24 views
How President Obama Rocks Social Media von Lisa Parkin
How President Obama Rocks Social MediaHow President Obama Rocks Social Media
How President Obama Rocks Social Media
Lisa Parkin1.5K views

Último

Journey of Generative AI von
Journey of Generative AIJourney of Generative AI
Journey of Generative AIthomasjvarghese49
30 views37 Folien
PTicketInput.pdf von
PTicketInput.pdfPTicketInput.pdf
PTicketInput.pdfstuartmcphersonflipm
376 views1 Folie
MOSORE_BRESCIA von
MOSORE_BRESCIAMOSORE_BRESCIA
MOSORE_BRESCIAFederico Karagulian
5 views8 Folien
PROGRAMME.pdf von
PROGRAMME.pdfPROGRAMME.pdf
PROGRAMME.pdfHiNedHaJar
17 views13 Folien
RIO GRANDE SUPPLY COMPANY INC, JAYSON.docx von
RIO GRANDE SUPPLY COMPANY INC, JAYSON.docxRIO GRANDE SUPPLY COMPANY INC, JAYSON.docx
RIO GRANDE SUPPLY COMPANY INC, JAYSON.docxJaysonGarabilesEspej
6 views3 Folien
Data structure and algorithm. von
Data structure and algorithm. Data structure and algorithm.
Data structure and algorithm. Abdul salam
18 views24 Folien

Último(20)

Data structure and algorithm. von Abdul salam
Data structure and algorithm. Data structure and algorithm.
Data structure and algorithm.
Abdul salam 18 views
Building Real-Time Travel Alerts von Timothy Spann
Building Real-Time Travel AlertsBuilding Real-Time Travel Alerts
Building Real-Time Travel Alerts
Timothy Spann109 views
Cross-network in Google Analytics 4.pdf von GA4 Tutorials
Cross-network in Google Analytics 4.pdfCross-network in Google Analytics 4.pdf
Cross-network in Google Analytics 4.pdf
GA4 Tutorials6 views
RuleBookForTheFairDataEconomy.pptx von noraelstela1
RuleBookForTheFairDataEconomy.pptxRuleBookForTheFairDataEconomy.pptx
RuleBookForTheFairDataEconomy.pptx
noraelstela167 views
JConWorld_ Continuous SQL with Kafka and Flink von Timothy Spann
JConWorld_ Continuous SQL with Kafka and FlinkJConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and Flink
Timothy Spann100 views
Chapter 3b- Process Communication (1) (1)(1) (1).pptx von ayeshabaig2004
Chapter 3b- Process Communication (1) (1)(1) (1).pptxChapter 3b- Process Communication (1) (1)(1) (1).pptx
Chapter 3b- Process Communication (1) (1)(1) (1).pptx
ayeshabaig20045 views
UNEP FI CRS Climate Risk Results.pptx von pekka28
UNEP FI CRS Climate Risk Results.pptxUNEP FI CRS Climate Risk Results.pptx
UNEP FI CRS Climate Risk Results.pptx
pekka2811 views
Advanced_Recommendation_Systems_Presentation.pptx von neeharikasingh29
Advanced_Recommendation_Systems_Presentation.pptxAdvanced_Recommendation_Systems_Presentation.pptx
Advanced_Recommendation_Systems_Presentation.pptx
Organic Shopping in Google Analytics 4.pdf von GA4 Tutorials
Organic Shopping in Google Analytics 4.pdfOrganic Shopping in Google Analytics 4.pdf
Organic Shopping in Google Analytics 4.pdf
GA4 Tutorials10 views
Survey on Factuality in LLM's.pptx von NeethaSherra1
Survey on Factuality in LLM's.pptxSurvey on Factuality in LLM's.pptx
Survey on Factuality in LLM's.pptx
NeethaSherra15 views
Launch of the Knowledge Exchange Platform - Romina Boarini - 21 November 2023 von StatsCommunications
Launch of the Knowledge Exchange Platform - Romina Boarini - 21 November 2023Launch of the Knowledge Exchange Platform - Romina Boarini - 21 November 2023
Launch of the Knowledge Exchange Platform - Romina Boarini - 21 November 2023
Short Story Assignment by Kelly Nguyen von kellynguyen01
Short Story Assignment by Kelly NguyenShort Story Assignment by Kelly Nguyen
Short Story Assignment by Kelly Nguyen
kellynguyen0118 views

AUA Data Science Meetup

  • 2. D AV I D G E V O R K YA N @ d a v i d g e v d a v i d g e v o r k y a n
  • 3. G R A D U AT E D A U A I N 2 0 0 8
  • 4. W H AT I S B I G D ATA ?
  • 5. FA S H I O N A B L E T E R M ?
  • 6. 8 0 % O F D ATA E X I S T I N G I N A N Y E N T E R P R I S E I S U N S T R U C T U R E D D ATA ST R U C T U R E D   DATA S E M I -­‐   ST R U C T U R E D   U N ST R U C T U R E D   DATA RDBMS Data Warehousing
  • 7. 9 0 % O F T H E D ATA I N T H E W O R L D T O D AY H A S B E E N C R E AT E D I N T H E L A S T T W O Y E A R S A L O N E S o u rc e : h t t p : / / w w w. i n t e l . c o m / c o n t e n t / w w w / u s / e n / c o m m u n i c a t i o n s / i n t e r n e t - m i n u t e - i n f o g r a p h i c . h t m l
  • 8. 4 V ’ S O F B I G D ATA VOLUME (large amount of data) VARIETY (sensors, video, audio, email, social) VELOCITY (speed of data generation) VERACITY (authenticity and/or accuracy)
  • 9. S O L U T I O N S R E Q U I R E D f o rc e s y o u t o c h a n g e t h e w a y y o u • C O L L E C T • T R A N S P O RT • S T O R E • M A N A G E • A N A LY Z E • V I S U A L I Z E
  • 11. W H AT I S D ATA S C I E N C E ?
  • 12. D ATA S C I E N C E ! = S TAT I S T I C A L A N A LY S I S I T I S S C I E N C E A N D “ A RT ” O F … • E X P L O R I N G T H E U N K N O W N A B O U T D ATA “ m a k e d i s c o v e r i e s w h i l e s w i m m i n g i n t h e d a t a ” • R E F I N I N G T H E R E S U LT S F O R A C C U R A C Y • D E R I V I N G A C T I O N A B L E I N S I G H T • C R E AT I N G D ATA - D R I V E N P R O D U C T S
  • 13. W H O A R E D ATA S C I E N T I S T S ?
  • 14. W H O A R E D ATA S C I E N T I S T S ? D re w C o n w a y, 2 0 1 0
  • 15. B I G D ATA S C I E N C E T O O L S ?
  • 16. • S c a l a , J a v a , P y t h o n , R … ( b o n u s : C l o j u re , H a s k e l l , E r l a n g ) • H a d o o p , H D F S , M a p R e d u c e … ( b o n u s : S p a r k , S t o r m , Te z ) • S c a l d i n g , H B a s e , P i g , H i v e … ( b o n u s : S h a r k , T i t a n , G i r a p h ) • F l u m e , S q o o p , E T L , We b s c r a p e r s … ( b o n u s : H u m e ) • S Q L , R D B M S , D W, O L A P… ( b o n u s : S O L R , E l a s t i c S e a rc h ) • K n i m e , We k a , R a p i d M i n e r… ( b o n u s : S c i P y, N u m P y, P a n d a s ) • D 3 . j s , K i b a n a , g g p l o t 2 , Ta b l e u … ( b o n u s : S h i n y, F l a re , D a t a m e e r ) • S P S S , M a t l a b , S A S … ( t h e e n t e r p r i s e m a n ) • N o S Q L , M o n g o D B , C a s s a n d r a , C o u c h D B • A n d Ye s ! … M S - E x c e l : t h e m o s t u s e d , m o s t u n d e r r a t e d D S t o o l
  • 18. G O A L ?
  • 19. • R e v e n u e , re v e n u e , re v e n u e • I m p ro v e t h e c u s t o m e r e x p e r i e n c e • I n c re a s e o p e r a t i o n a l e ff i c i e n c y • G E : O p t i m i z e m a i n t e n a n c e i n t e r v a l s f o r i n d u s t r i a l p ro d u c t s • G o o g l e : R e f i n e s e a rc h a n d a d - s e r v i n g a l g o r i t h m s • Z y n g a : O p t i m i z e t h e g a m e e x p e r i e n c e f o r b o t h l o n g - t e r m e n g a g e m e n t a n d re v e n u e • N e t f l i x : M o v i e re c o m m e n d a t i o n s • K a p l a n : U n c o v e r e ff e c t i v e l e a r n i n g s t r a t e g i e s • e H a r m o n y : C re a t e h a p p y re l a t i o n s h i p s
  • 20. W H O A R E W E ?
  • 21. T R A D I T I O N A L M E T H O D S D O N O T W O R K A N Y M O R E …
  • 22. E H A R M O N Y C R E AT E S T H E H A P P I E S T, M O S T PA S S I O N AT E A N D M O S T F U L F I L L I N G R E L AT I O N S H I P S * * A C C O R D I N G T O A R E C E N T S T U D Y
  • 23. 4 3 8 M A R R I A G E S P E R D AY
  • 24. T H E D I F F E R E N C E ?
  • 25. T H E D I F F E R E N C E ? Compatibility Matching System® C O M PAT I B I L I T Y M AT C H I N G A F F I N I T Y M AT C H I N G M AT C H D I S T R I B U T I O N
  • 26. T H E D I F F E R E N C E ? Compatibility Matching System® C O M PAT I B I L I T Y M AT C H I N G A F F I N I T Y M AT C H I N G M AT C H D I S T R I B U T I O N
  • 27. U N I D I R E C T I O N A L U S E R D E F I N E D C R I T E R I A Nicolette
  • 28. U N I D I R E C T I O N A L U S E R D E F I N E D C R I T E R I A B I D I R E C T I O N A L Leo Ian Steve Nicolette
  • 29. U N I D I R E C T I O N A L U S E R D E F I N E D C R I T E R I A Leo Ian Steve Nicolette B I D I R E C T I O N A L
  • 33. 150     ques5ons Personality   Values   A@ributes   Beliefs
  • 35. C O M PAT I B I L I T Y M AT C H I N G U S E R D E F I N E D C R I T E R I A C O M PAT I B I L I T Y M O D E L S M O N G O D B V O L D E M O RT
  • 36. M O N G O D B DATA STORE NEEDS P O W E R F U L I N D E X I N G M O D E L S FA S T M U LT I - AT T R I B U T E S E A R C H E S E A S Y T O M A I N TA I N 6 0 M + Q U E R I E S per day
  • 37. M O N G O D B WINS A U T O S C A L I N G B U I LT- I N S H A R D I N G A U T O B A L A N C I N G M M S
  • 38. V O L D E M O RT ? T H AT N A M E S O U N D S FA M I L I A R
  • 39. V O L D E M O RT DATA STORE NEEDS C R U D O P E R AT I O N S VA R I E D T R A N S A C T I O N S I Z E S B I L L I O N + P O T E N T I A L M AT C H E S per day
  • 40. V O L D E M O RT WINS A U T O R E P L I C AT I O N A U T O PA RT I T I O N I N G P L U G G A B L E S E R I A L I Z AT I O N
  • 41. A F F I N I T Y M AT C H I N G Compatibility Matching System® C O M PAT I B I L I T Y M AT C H I N G A F F I N I T Y M AT C H I N G M AT C H D I S T R I B U T I O N
  • 43. Commprobability Distance in Miles 0 1 3 7 15 63 255 1023 4095 P R O B
  • 45. Commprobability Height difference in cm -29 -25 -21 -17 -13 -9 -6 -3 0 3 6 9 12 16 20 24 28 32 36 40 44 48 52 56 4  -­‐  8  in P R O B
  • 46. W O R D S T O U S E
  • 47. W O R D S T O U S E
  • 48. S O M E I N S I G H T
  • 49. D ATA N E E D S F O R A F F I N I T Y 5 0 M + R E G I S T E R E D U S E R S 1 0 3 AT T R I B U T E S 1 0 7 D A I LY M AT C H E S 2 5 0 M + P H O T O S 4 B + Q U E S T I O N N A I R E S A N S W E R E D
  • 50. C O M M U N I C AT I O N A G G R E G AT E S E V E N T L I S T E N E R S E R V I C E U S E R A C T I V I T Y S E R V I C E ~ 5 M S R E S P O N S E T I M E S 1 0 K E V E N T S P E R S E C O N D U S E R S E R V I C E H O U R LY, D A I LY T O TA L
  • 51. O F F L I N E B AT C H J O B S U S E R S E R V I C E M A P - S I D E J O I N S ( T B ) S C O R I N G 1+GB  Compressed  Protocol   Buffers   PA I R I N G S S E R V I C E 750M  Compressed   Protocol  Buffers   B I L L I O N + P O T E N T I A L M AT C H E S
  • 52. A M A Z O N E M R AW S D I R E C T C O N N E C T 2 5 6 N O D E S 5 0 T B S T O R A G E I N - H O U S E S E A M I C R O D ATA R E T R I E VA L L AT E N C Y L O W O P E R AT I O N A L C O S T L O W P O W E R C O N S U M P T I O N P R E D I C TA B L E C O M P L E T I O N T I M E S
  • 53. M O D E L R E T R A I N I N G distcp Protocol  Buffers  from   Offline  Jobs  
  • 54. M AT C H D I S T R I B U T I O N Compatibility Matching System® C O M PAT I B I L I T Y M AT C H I N G A F F I N I T Y M AT C H I N G M AT C H D I S T R I B U T I O N
  • 55. Delivering the right matches at the right time to as many people as possible across the entire network
  • 62. T H A N K Y O U Q U E S T I O N S ?
  • 63. C R E D I T S : The Noun Project http://thenounproject.com Visual Elements From