SlideShare ist ein Scribd-Unternehmen logo
1 von 28
Downloaden Sie, um offline zu lesen
JoohyunKim
Sr.	
  Data	
  Scientist
MyFitnessPal – Under	
  Armour Connected	
  Fitness
Who	
  are	
  we?
#1	
  in	
  
Health	
  &	
  
Fitness
~7M	
  food	
  
database
14.5B+	
  
food	
  
logging	
  
data
85M+	
  
Users
Workout
e-­‐
Commerce
Retail
120M+	
  
Users
6/16/2015 MyFitnessPal	
  -­‐ Under	
  Armour	
  Connected	
  Fitness 2
Under	
  Armour =	
  Apparel	
  Company?
• http://www.fool.com/investing/general/2015/06/07/how-­‐under-­‐armour-­‐is-­‐
becoming-­‐a-­‐tech-­‐company.aspx
6/16/2015 MyFitnessPal	
  -­‐ Under	
  Armour	
  Connected	
  Fitness 3
Food	
  /	
  Nutrition
•14.5B+	
  logged	
  foods
•~7M	
  foods
•38M+	
  recipes
•From	
  85M+	
  users
Workout
•Time-­‐series	
  GPS	
  data
•Music	
  data	
  with	
  workout
•Run/Ride/Walk
•Active	
  time	
  of	
  the	
  day
•Sleep	
  pattern
•390B+	
  calories	
  burned	
  (MFP)
•1.2T+	
  minutes	
  exercise	
  (MFP)
Retail	
  /	
  e-­‐
Commerce
•Product	
  purchase	
  
transactions
•User	
  preferences	
  on	
  
clothes	
  /	
  shoes	
  /	
  wearable	
  
devices
Under	
  Armour =	
  Data	
  Company!
6/16/2015 MyFitnessPal	
  -­‐ Under	
  Armour	
  Connected	
  Fitness 4
Importance	
  of	
  Data
6/16/2015 MyFitnessPal	
  -­‐ Under	
  Armour	
  Connected	
  Fitness 5
Biggest	
  Concern	
  of	
  Life:	
  What	
  to	
  Eat?
Taste
Healthy
Preference
Dietary
Restriction
6/16/2015 MyFitnessPal	
  -­‐ Under	
  Armour	
  Connected	
  Fitness 6
Recommender	
  System?
Word	
  of	
  
mouth
Reviews	
  /
Blogs
Expert	
  
Advice
6/16/2015 MyFitnessPal	
  -­‐ Under	
  Armour	
  Connected	
  Fitness
Typical	
  ways	
  of	
  getting	
  recommendation
Limited Biased Lack	
  of	
  Source
7
John Mary Mike Jane
Banana 5 3 1 4
Blueberry 3 -­‐ -­‐ 2
Apple -­‐ -­‐ 2 -­‐ (?)
Melon 1 -­‐ -­‐ -­‐ (?)
Collaborative	
  Filtering
• Predict	
  how	
  a	
  user	
  may	
  like	
  a	
  new	
  item	
  based	
  on	
  
prior	
  user	
  behaviors	
  with	
  similar	
  preference
6/16/2015 MyFitnessPal	
  -­‐ Under	
  Armour	
  Connected	
  Fitness
User	
  – Food	
  Logged	
  Counts	
  Table
8
Matrix	
  Factorization
• Filling	
  in	
  the	
  missing	
  entries	
  in	
  ratings	
  (user-­‐food	
  
logged	
  counts)	
  matrix
• Formulate	
  as	
  low-­‐rank	
  matrix	
  factorization
• Factorize	
  user-­‐item	
  matrix	
  to	
  user-­‐feature	
  and	
  feature-­‐
item	
  matrix	
  (#	
  features	
  	
  ≪ #	
  users	
  or	
  #	
  items)
6/16/2015 MyFitnessPal	
  -­‐ Under	
  Armour	
  Connected	
  Fitness
R X
YT
Objective
Minimize	
  RMSE	
  (Root	
  
mean	
  squared	
  error)	
  
between	
   𝑅	
  and	
   𝑋	
  ×	
   𝑌'
≈ ×
Users Items
9
• Explicit	
  ratings	
  are	
  available	
  for	
  movies	
  /	
  songs
• Typically	
  1~5	
  stars	
  (ratings)	
  given
• For	
  MFP	
  food	
  logging	
  events,	
  there	
  are	
  only	
  “logged”
foods.	
  No	
  negative	
  feedback
• Can’t	
  assume	
  0	
  count	
  (no	
  entry)	
  as	
  negative
• Reference:	
  Hu,	
  Koren,	
  and	
  Volinsky,	
  Collaborative	
  Filtering	
  with	
  
Implicit	
  Feedback	
  Dataset,	
  ICDM	
  08
• Construct	
  “binary”	
  ratings	
  matrix	
  P,	
  and	
  factorize	
  P	
  
instead	
  of	
  R	
  (original	
  ratings	
  matrix)
“Implicit”	
  Matrix	
  Factorization
6/16/2015 MyFitnessPal	
  -­‐ Under	
  Armour	
  Connected	
  Fitness
5 3 ? 3 ?
? 2 ? ? 4
1 ? ? ? ?
? ? 1 2 ?
? ? 2 ? 1
? 3 ? ? 1
1 1 ? 1 ?
? 1 ? ? 1
1 ? ? ? ?
? ? 1 1 ?
? ? 1 ? 1
? 1 ? ? 1
R P
⇒ ≈ ×
X Y
10
Alternating	
  Least	
  Squares
• Optimizing	
  X	
  (user)	
  and	
  Y	
  (item)	
  at	
  the	
  same	
  time	
  is	
  
hard
• Fix	
  X	
  or	
  Y	
  ⟹ Solve	
  for	
  the	
  other
• Solve	
  the	
  system	
  of	
  linear	
  equations
• Take	
  the	
  derivative	
  of	
  objective	
  function	
  w.r.t X	
  or	
  Y,	
  set	
  
0,	
  and	
  solve
• Starting	
  with	
  random	
  initialization	
  of	
  Y
• EM-­‐like	
  iterative	
  process
• Iterate	
  until	
  the	
  change	
  is	
  very	
  small	
  (or	
  stop	
  with	
  
fixed	
  iteration	
  number)
6/16/2015 MyFitnessPal	
  -­‐ Under	
  Armour	
  Connected	
  Fitness 11
Scalability?	
  ⇒ Parallelization!
6/16/2015 MyFitnessPal -­‐ Under	
  Armour Connected	
  Fitness
• Rating	
  matrix:	
  85M	
  users	
  × 7M	
  foods	
  ≅ 595T	
  
entries
• Impossible	
  to	
  fit	
  in	
  a	
  single	
  machine
• Sparse	
  representation:	
  billions	
  of	
  entries
• ALS	
  can	
  be	
  easily	
  parallelized	
  with	
  map-­‐reduce	
  
framework
• Sharding users	
  and	
  items	
  vectors
• Mapper	
  on	
  individual	
  sub-­‐matrix
• Reducer	
  on	
  aggregation	
  over	
  users/items
• Spark	
  MLLib
• Parallelized	
  version	
  of	
  ALS	
  ready	
  to	
  use
• Fast	
  computation	
  with	
  DataFrame
val model  =  new  ALS()
.setRank(20)
.setImplicitPrefs(true)
.setAlpha(40)
.setRegParam(0.1)
.setMaxIter(10)
.fit(ratings)
12
Food	
  Recommendation	
  Pipeline
Logged	
  foods	
  data	
  (user	
  /	
  food)
Predict	
  food	
  preference	
  by	
  matrix	
  
factorization
Generate	
  top	
  K	
  food
recommendation
6/16/2015 MyFitnessPal	
  -­‐ Under	
  Armour	
  Connected	
  Fitness 13
Generating	
  Top	
  K	
  Recommendations
• We	
  need	
  to	
  serve	
  top	
  recommended	
  foods	
  to	
  users
• With	
  the	
  trained	
  factorized	
  matrix	
  model,
• Predict	
  top	
  K	
  foods	
  for	
  each	
  user	
  (in	
  the	
  order	
  of	
  their	
  
own	
  preference)
• Seem	
  trivial,	
  but	
  the	
  computation	
  is	
  huge
• For	
  each	
  user,	
  retrieve	
  food	
  preference	
  by	
   𝑅 = 𝑋	
  ×	
   𝑌'
• Get	
  top	
  K	
  per	
  each	
  user:	
  min(O(𝐾𝑚𝑛),	
  O(𝑚𝑛 log 𝑛))
• m:	
  #	
  of	
  users,	
  n:	
  #	
  of	
  items
• Same	
  order	
  of	
  constructing	
  whole	
  ratings	
  matrix
• Major	
  bottleneck	
  of	
  the	
  entire	
  pipeline
• No	
  easy	
  way	
  to	
  get	
  around	
  the	
  computation
6/16/2015 MyFitnessPal	
  -­‐ Under	
  Armour	
  Connected	
  Fitness 14
Some	
  Numbers
• Spark	
  cluster
• 72	
  nodes	
  (1	
  master	
  +	
  71	
  workers)
• 2TB	
  memory	
  ⟸ One	
  of	
  the	
  largest	
  clusters	
  in	
  production
• Dataset
• User	
  :	
  85M+
• Item	
  (food)	
  :	
  ~7M
• Rating	
  (food	
  log	
  counts):	
  6.5B+	
  (aggregated	
  per	
  user/food)
• Time
• ALS	
  model	
  training:	
  4	
  hours	
  
• Generating	
  top	
  K	
  food	
  recommendation	
  for	
  every	
  user:	
  48	
  
hours
• More	
  than	
  20x speed	
  improvement	
  over	
  Mahout	
  in	
  
conventional	
  Hadoop	
  cluster
6/16/2015 MyFitnessPal	
  -­‐ Under	
  Armour	
  Connected	
  Fitness 15
Advantages	
  Using	
  Spark
• Faster	
  development	
  cycle
• MLLib
• Parallelization	
  provided	
  via	
  RDD	
  with	
  abstraction
• Easy	
  to	
  construct	
  data	
  pipeline	
  with	
  DataFrame
• Easy	
  to	
  load	
  /	
  export	
  data	
  in	
  and	
  out	
  of	
  S3	
  /	
  Redshift
• Faster	
  model	
  optimization
• In-­‐memory,	
  distributed	
  computation
• Faster	
  model	
  training	
  /	
  testing	
  
• Significant	
  reduction	
  in	
  parameter	
  tuning	
  /	
  optimization	
  on	
  
validation	
  dataset
• Easy	
  scalability
• By	
  launching	
  more	
  worker	
  instances
• Enables	
  frequent	
  model	
  updates	
  
• Reflect	
  user	
  preference	
  change	
  more	
  often
6/16/2015 MyFitnessPal	
  -­‐ Under	
  Armour	
  Connected	
  Fitness 16
Sample	
  Food	
  Recommendation
6/16/2015 MyFitnessPal	
  -­‐ Under	
  Armour	
  Connected	
  Fitness
Recommended	
  Foods
Cooked	
  White	
  Jasmine	
  Rice
Steamed	
  White	
  Rice	
  
(Unenriched)
Pho
Kimchi
Tofu	
  -­‐ Fried
Miso	
  Soup	
  With	
  Seaweed	
  and	
  
Tofu
Shrimp	
  Dumplings
Miso	
  Soup
Salmon	
  Nigiri
Sunny	
  Side	
  Up
Logged	
  Foods
Korean	
  soy	
  milk	
  with	
  high	
  
calcium
Coke	
  12oz
Fried	
  rice
Korean	
  mixed	
  grain	
  shake
Korean	
  Rice	
  Cake
Sweet	
  Soy	
  Milk
Blackberries	
  -­‐ Raw
Blueberries	
  -­‐ Raw
Korean	
  Melon	
  (Chameh /	
  참외)
Japchae (Korean	
  Stir-­‐Fried	
  
Sweet	
  Potato	
  Noodles)
17
Extension:	
  Recipe	
  Recommendation
• Recipes	
  are	
  not	
  “public”
• Currently,	
  only	
  foods	
  are	
  shared	
  across	
  
different	
  users
• Recipes	
  are	
  “private” to	
  individual	
  user	
  
when	
  created
• Cannot	
  construct	
  standard	
  user-­‐item	
  
ratings	
  matrix
• Solution:	
  Recommend	
  recipes	
  using	
  
similarity	
  with	
  foods
6/16/2015 MyFitnessPal	
  -­‐ Under	
  Armour	
  Connected	
  Fitness
• Advantages	
  of	
  recommending	
  recipes
• Richer	
  metadata	
  (instructions,	
  ingredients,	
  cuisines,	
  …)
• Complete	
  food,	
  home-­‐cookable
• Customizable	
  with	
  personal	
  preference/restriction
18
Extension:	
  Recipe	
  Recommendation
Logged	
  foods	
  data	
  (user	
  /	
  food)
Predict	
  food	
  preference	
  by	
  matrix	
  factorization
Generate	
  top	
  K	
  food recommendation
Food	
  – Recipe	
  similarity	
  computation	
  (LSH,	
  
kNN,	
  by	
  nutrition)
Final	
  top	
  K	
  recipes	
  from food	
  recommendation	
  
rank	
  and	
  recipe	
  similarity	
  rank
6/16/2015 MyFitnessPal	
  -­‐ Under	
  Armour	
  Connected	
  Fitness
Food	
  
Recom
menda
tion
Recipe
Recom
menda
tion
19
Nearest	
  Neighbor	
  with	
  Locality	
  Sensitive	
  Hashing
• Naïve	
  nearest	
  neighbor	
  computation:	
  O(n2)	
  
• Nearly	
  impossible	
  with	
  7M	
  foods	
  and	
  38M	
  recipes
• Locality	
  Sensitive	
  Hashing	
  (LSH)
• Hashes	
  similar	
  items	
  into	
  the	
  same	
  buckets	
  with	
  high	
  
probability
• Similarity	
  metric:	
  Euclidean	
  distance	
  on	
  nutrition	
  vector
• NN	
  computation	
  much	
  faster,	
  look	
  up	
  within	
  the	
  bucket:	
  
O(k2)	
  (k:	
  max	
  size	
  of	
  bucket)
• Spark	
  implementation
• https://github.com/mrsqueeze/spark-­‐hash
6/16/2015 MyFitnessPal	
  -­‐ Under	
  Armour	
  Connected	
  Fitness 20
Top	
  K	
  Recipe	
  Recommendation
• Order	
  by	
  sum	
  of	
  food	
  recommendation	
  rank	
  and	
  recipe	
  similarity	
  rank
6/16/2015 MyFitnessPal	
  -­‐ Under	
  Armour	
  Connected	
  Fitness
Rank Food
1 Banana
2 Blueberry
3 Apple
4 Pad	
  Thai
5 …
6 …
Rank Recipe
1 Banana	
  Pie
2 Banana	
  Smoothie
3 Banana	
  Ice	
  Cream
Rank Recipe
1 Blueberry Yogurt
2 Blueberry	
  Pie
3 Salad with	
  Blueberry
…
Combin
ed	
  Rank
Recipe
2 Banana	
  Pie
3
Banana	
  
Smoothie
3 Blueberry	
  Yogurt
4
Banana	
  Ice	
  
Cream
4 Blueberry	
  Pie
5 …
21
Sample	
  Recipe	
  Recommendations
Recommended	
  Recipes
Noodle	
  sauce
Stone	
  Ground	
   Dijon	
  Mustard	
  
Marinade
Citrus	
  Dijon	
  Miso	
  Dressing
Shirataki Noodle	
  Soup
dumpling	
   sauce
Dijon	
  Miracle	
  Whip
Paleo	
  Vanilla	
  Ice-­‐Crème
Mable's	
  Chili	
  Burrito
cake	
  batter	
  milkshake
Pita	
  pizza
6/16/2015 MyFitnessPal	
  -­‐ Under	
  Armour	
  Connected	
  Fitness
Recommended	
  Foods
Cooked	
  White	
  Jasmine	
  Rice
Steamed	
  White	
  Rice	
  
(Unenriched)
Pho
Kimchi
Tofu	
  -­‐ Fried
Miso	
  Soup	
  With	
  Seaweed	
  and	
  
Tofu
Shrimp	
  Dumplings
Miso	
  Soup
Salmon	
  Nigiri
Sunny	
  Side	
  Up
22
Integrating	
  with	
  Taste	
  Profiles
• Machine	
  learning	
  classifier	
  that	
  outputs	
  probability	
  
distribution	
  over	
  6	
  taste	
  categories
• Savory,	
  Sweet,	
  Sour,	
  Spicy,	
  Salty,	
  Bitter
• NN	
  classifier	
  performed	
  over	
  feature	
  vector	
  (semantic	
  
word	
  vector	
  +	
  numeric	
  nutritional	
  value	
  vector)
• With	
  small	
  number	
  (~1400)	
  of	
  labeled	
  foods
• 87%	
  accuracy	
  on	
  separately	
  labeled	
  test	
  set	
  (~1200)
• Works	
  as	
  additional	
  metadata	
  for	
  foods
• Recommendation	
  results	
  can	
  be	
  further	
  filtered	
  /	
  
reordered	
  by	
  personal	
  taste	
  preferences
• With	
  Spark,	
  data	
  integration	
  with	
  hash-­‐join	
  is	
  much	
  
faster
6/16/2015 MyFitnessPal	
  -­‐ Under	
  Armour	
  Connected	
  Fitness 23
Problems
• Cold-­‐start	
  problem
• New	
  user	
  (important)
• New	
  food	
  item	
  (not	
  too	
  much,	
  since	
  they	
  may	
  not	
  be	
  
popular)
• Solution
• Hybrid	
  with	
  content-­‐based	
  recommendation
• Construct	
  basic	
  profile	
  for	
  new	
  users	
  to	
  get	
  baseline	
  
recommendation
• May	
  recommend	
  new	
  items	
  based	
  on	
  feature	
  similarity
• Runs	
  the	
  pipeline	
  more	
  often
• For	
  new	
  users,	
  recommend	
  top/popular	
  foods
• After	
  a	
  while,	
  these	
  users	
  will	
  get	
  recommendation
6/16/2015 MyFitnessPal	
  -­‐ Under	
  Armour	
  Connected	
  Fitness 24
Possible	
  Extension
• Collaborative	
  filtering	
  over	
  
aggregated	
  food	
  clusters
• User-­‐generated	
  foods	
  contain	
  (near)	
  
duplicates
• Combine	
  with	
  verified	
  food	
  project	
  
• Spark-­‐based	
  data	
  deduplication	
  /	
  
processing	
  pipeline
• Construct	
  “verified”	
  foods	
  out	
  of	
  
thorough	
  clustering
• Recommend	
  users	
  with	
  high-­‐quality	
  
food
• Recommend	
  representative	
  food	
  from	
  
each	
  cluster,	
  instead	
  of	
  individual	
  
variation
• Reducing	
  the	
  computation	
  time	
  due	
  
to	
  reduced	
  dimension
6/16/2015 MyFitnessPal	
  -­‐ Under	
  Armour	
  Connected	
  Fitness 25
Application
• Recommend	
  frequently	
  paired	
  
foods
• Pairing	
  foods	
  within	
  a	
  single	
  meal	
  
depends	
  on
• Individual	
  user’s	
  own	
  preference
• Cultural	
  difference	
  (region,	
  
country)
• Simple	
  way
• Suggest	
  popular	
  foods	
  based	
  on	
  
co-­‐occurrence	
  stats	
  per	
  individual	
  
user	
  /	
  overall	
  users
• Utilize	
  this	
  framework	
  to	
  capture	
  
better	
  personalized	
  preference
6/16/2015 MyFitnessPal	
  -­‐ Under	
  Armour	
  Connected	
  Fitness 26
Summary
• Spark-­‐powered	
  machine	
  learning	
  pipeline	
  for	
  
food/recipe	
  recommendation	
  system
• Faster	
  computation	
  help	
  reduce	
  the	
  time	
  on	
  
development	
  cycle
• Help	
  data	
  scientists	
  focus	
  on	
  core	
  problems
• Easy	
  extension	
  by	
  attaching	
  additional	
  data	
  processing	
  
steps	
  with	
  scalability
• Only	
  scratched	
  a	
  surface
• Food	
  /	
  recipe	
  recommendation
• Extensions	
  with	
  other	
  data	
  sources
• Workout
• Music
• Retail
• e-­‐Commerce	
  
6/16/2015 MyFitnessPal	
  -­‐ Under	
  Armour	
  Connected	
  Fitness 27
Questions?
6/16/2015 MyFitnessPal	
  -­‐ Under	
  Armour	
  Connected	
  Fitness 28

Weitere ähnliche Inhalte

Ähnlich wie iRIS: A Large-Scale Food and Recipe Recommendation System Using Spark-(Joohyun Kim, MyFitnessPal, Under Armour-Connected Fitness)

Ovenbot the real kitchen of the future
Ovenbot the real kitchen of the futureOvenbot the real kitchen of the future
Ovenbot the real kitchen of the future
shawn212
 
UEMCON_2016_4
UEMCON_2016_4UEMCON_2016_4
UEMCON_2016_4
Zsolt Ori
 
Filling the Exercise Prescription
Filling the Exercise PrescriptionFilling the Exercise Prescription
Filling the Exercise Prescription
EsserHealth
 

Ähnlich wie iRIS: A Large-Scale Food and Recipe Recommendation System Using Spark-(Joohyun Kim, MyFitnessPal, Under Armour-Connected Fitness) (20)

Evaluation of MyFitnessPal
Evaluation of MyFitnessPalEvaluation of MyFitnessPal
Evaluation of MyFitnessPal
 
Nutrition Strategies in Worksite Wellness by TDSHS
Nutrition Strategies in Worksite Wellness by TDSHSNutrition Strategies in Worksite Wellness by TDSHS
Nutrition Strategies in Worksite Wellness by TDSHS
 
Off seasonrts 11.13
Off seasonrts 11.13Off seasonrts 11.13
Off seasonrts 11.13
 
Ovenbot the real kitchen of the future
Ovenbot the real kitchen of the futureOvenbot the real kitchen of the future
Ovenbot the real kitchen of the future
 
Menu labeling risk mitigation
Menu labeling  risk mitigationMenu labeling  risk mitigation
Menu labeling risk mitigation
 
SEO Training May 2014
SEO Training May 2014SEO Training May 2014
SEO Training May 2014
 
MyPlate plan PowerPoint Presentation
MyPlate plan PowerPoint PresentationMyPlate plan PowerPoint Presentation
MyPlate plan PowerPoint Presentation
 
Educate 2017: Today’s Special: Item versioning and dynamic content
Educate 2017: Today’s Special: Item versioning and dynamic contentEducate 2017: Today’s Special: Item versioning and dynamic content
Educate 2017: Today’s Special: Item versioning and dynamic content
 
Enhancing Precision Wellness with Knowledge Graphs and Semantic Analytics: O...
Enhancing Precision Wellness with  Knowledge Graphs and Semantic Analytics: O...Enhancing Precision Wellness with  Knowledge Graphs and Semantic Analytics: O...
Enhancing Precision Wellness with Knowledge Graphs and Semantic Analytics: O...
 
Thai food recommendation
Thai food recommendationThai food recommendation
Thai food recommendation
 
Circadian Rhythms of Food Intake: Are You Seeing The Whole Picture?
Circadian Rhythms of Food Intake: Are You Seeing The Whole Picture? Circadian Rhythms of Food Intake: Are You Seeing The Whole Picture?
Circadian Rhythms of Food Intake: Are You Seeing The Whole Picture?
 
Herbalife 24 presentation
Herbalife 24  presentationHerbalife 24  presentation
Herbalife 24 presentation
 
UEMCON_2016_4
UEMCON_2016_4UEMCON_2016_4
UEMCON_2016_4
 
User modelling challenge ideatory 2014
User modelling challenge ideatory 2014User modelling challenge ideatory 2014
User modelling challenge ideatory 2014
 
Filling the Exercise Prescription
Filling the Exercise PrescriptionFilling the Exercise Prescription
Filling the Exercise Prescription
 
Small Steps to Health and Wealth Presentation
Small Steps to Health and Wealth PresentationSmall Steps to Health and Wealth Presentation
Small Steps to Health and Wealth Presentation
 
HCP Overview by Dr Bill Toth
HCP Overview by Dr Bill TothHCP Overview by Dr Bill Toth
HCP Overview by Dr Bill Toth
 
Chicago Child Care Center Standards Board of Health- March 2012
Chicago Child Care Center Standards  Board of Health- March 2012Chicago Child Care Center Standards  Board of Health- March 2012
Chicago Child Care Center Standards Board of Health- March 2012
 
13 a dietary analysis
13 a dietary analysis13 a dietary analysis
13 a dietary analysis
 
ProductPitch.pptx
ProductPitch.pptxProductPitch.pptx
ProductPitch.pptx
 

Mehr von Spark Summit

Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang WuApache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Spark Summit
 
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data  with Ramya RaghavendraImproving Traffic Prediction Using Weather Data  with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Spark Summit
 
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
Spark Summit
 
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya RaghavendraImproving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Spark Summit
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Spark Summit
 
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
Spark Summit
 

Mehr von Spark Summit (20)

FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
 
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
 
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang WuApache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
 
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data  with Ramya RaghavendraImproving Traffic Prediction Using Weather Data  with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
 
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
 
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
 
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
 
Next CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub WozniakNext CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub Wozniak
 
Powering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin KimPowering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin Kim
 
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya RaghavendraImproving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
 
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
 
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
 
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
 
Goal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim SimeonovGoal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim Simeonov
 
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
 
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir VolkGetting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir Volk
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
 
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
 

Kürzlich hochgeladen

👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
karishmasinghjnh
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
amitlee9823
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
amitlee9823
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
amitlee9823
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
amitlee9823
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
amitlee9823
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
JoseMangaJr1
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
amitlee9823
 

Kürzlich hochgeladen (20)

👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 

iRIS: A Large-Scale Food and Recipe Recommendation System Using Spark-(Joohyun Kim, MyFitnessPal, Under Armour-Connected Fitness)

  • 1. JoohyunKim Sr.  Data  Scientist MyFitnessPal – Under  Armour Connected  Fitness
  • 2. Who  are  we? #1  in   Health  &   Fitness ~7M  food   database 14.5B+   food   logging   data 85M+   Users Workout e-­‐ Commerce Retail 120M+   Users 6/16/2015 MyFitnessPal  -­‐ Under  Armour  Connected  Fitness 2
  • 3. Under  Armour =  Apparel  Company? • http://www.fool.com/investing/general/2015/06/07/how-­‐under-­‐armour-­‐is-­‐ becoming-­‐a-­‐tech-­‐company.aspx 6/16/2015 MyFitnessPal  -­‐ Under  Armour  Connected  Fitness 3
  • 4. Food  /  Nutrition •14.5B+  logged  foods •~7M  foods •38M+  recipes •From  85M+  users Workout •Time-­‐series  GPS  data •Music  data  with  workout •Run/Ride/Walk •Active  time  of  the  day •Sleep  pattern •390B+  calories  burned  (MFP) •1.2T+  minutes  exercise  (MFP) Retail  /  e-­‐ Commerce •Product  purchase   transactions •User  preferences  on   clothes  /  shoes  /  wearable   devices Under  Armour =  Data  Company! 6/16/2015 MyFitnessPal  -­‐ Under  Armour  Connected  Fitness 4
  • 5. Importance  of  Data 6/16/2015 MyFitnessPal  -­‐ Under  Armour  Connected  Fitness 5
  • 6. Biggest  Concern  of  Life:  What  to  Eat? Taste Healthy Preference Dietary Restriction 6/16/2015 MyFitnessPal  -­‐ Under  Armour  Connected  Fitness 6
  • 7. Recommender  System? Word  of   mouth Reviews  / Blogs Expert   Advice 6/16/2015 MyFitnessPal  -­‐ Under  Armour  Connected  Fitness Typical  ways  of  getting  recommendation Limited Biased Lack  of  Source 7
  • 8. John Mary Mike Jane Banana 5 3 1 4 Blueberry 3 -­‐ -­‐ 2 Apple -­‐ -­‐ 2 -­‐ (?) Melon 1 -­‐ -­‐ -­‐ (?) Collaborative  Filtering • Predict  how  a  user  may  like  a  new  item  based  on   prior  user  behaviors  with  similar  preference 6/16/2015 MyFitnessPal  -­‐ Under  Armour  Connected  Fitness User  – Food  Logged  Counts  Table 8
  • 9. Matrix  Factorization • Filling  in  the  missing  entries  in  ratings  (user-­‐food   logged  counts)  matrix • Formulate  as  low-­‐rank  matrix  factorization • Factorize  user-­‐item  matrix  to  user-­‐feature  and  feature-­‐ item  matrix  (#  features    ≪ #  users  or  #  items) 6/16/2015 MyFitnessPal  -­‐ Under  Armour  Connected  Fitness R X YT Objective Minimize  RMSE  (Root   mean  squared  error)   between   𝑅  and   𝑋  ×   𝑌' ≈ × Users Items 9
  • 10. • Explicit  ratings  are  available  for  movies  /  songs • Typically  1~5  stars  (ratings)  given • For  MFP  food  logging  events,  there  are  only  “logged” foods.  No  negative  feedback • Can’t  assume  0  count  (no  entry)  as  negative • Reference:  Hu,  Koren,  and  Volinsky,  Collaborative  Filtering  with   Implicit  Feedback  Dataset,  ICDM  08 • Construct  “binary”  ratings  matrix  P,  and  factorize  P   instead  of  R  (original  ratings  matrix) “Implicit”  Matrix  Factorization 6/16/2015 MyFitnessPal  -­‐ Under  Armour  Connected  Fitness 5 3 ? 3 ? ? 2 ? ? 4 1 ? ? ? ? ? ? 1 2 ? ? ? 2 ? 1 ? 3 ? ? 1 1 1 ? 1 ? ? 1 ? ? 1 1 ? ? ? ? ? ? 1 1 ? ? ? 1 ? 1 ? 1 ? ? 1 R P ⇒ ≈ × X Y 10
  • 11. Alternating  Least  Squares • Optimizing  X  (user)  and  Y  (item)  at  the  same  time  is   hard • Fix  X  or  Y  ⟹ Solve  for  the  other • Solve  the  system  of  linear  equations • Take  the  derivative  of  objective  function  w.r.t X  or  Y,  set   0,  and  solve • Starting  with  random  initialization  of  Y • EM-­‐like  iterative  process • Iterate  until  the  change  is  very  small  (or  stop  with   fixed  iteration  number) 6/16/2015 MyFitnessPal  -­‐ Under  Armour  Connected  Fitness 11
  • 12. Scalability?  ⇒ Parallelization! 6/16/2015 MyFitnessPal -­‐ Under  Armour Connected  Fitness • Rating  matrix:  85M  users  × 7M  foods  ≅ 595T   entries • Impossible  to  fit  in  a  single  machine • Sparse  representation:  billions  of  entries • ALS  can  be  easily  parallelized  with  map-­‐reduce   framework • Sharding users  and  items  vectors • Mapper  on  individual  sub-­‐matrix • Reducer  on  aggregation  over  users/items • Spark  MLLib • Parallelized  version  of  ALS  ready  to  use • Fast  computation  with  DataFrame val model  =  new  ALS() .setRank(20) .setImplicitPrefs(true) .setAlpha(40) .setRegParam(0.1) .setMaxIter(10) .fit(ratings) 12
  • 13. Food  Recommendation  Pipeline Logged  foods  data  (user  /  food) Predict  food  preference  by  matrix   factorization Generate  top  K  food recommendation 6/16/2015 MyFitnessPal  -­‐ Under  Armour  Connected  Fitness 13
  • 14. Generating  Top  K  Recommendations • We  need  to  serve  top  recommended  foods  to  users • With  the  trained  factorized  matrix  model, • Predict  top  K  foods  for  each  user  (in  the  order  of  their   own  preference) • Seem  trivial,  but  the  computation  is  huge • For  each  user,  retrieve  food  preference  by   𝑅 = 𝑋  ×   𝑌' • Get  top  K  per  each  user:  min(O(𝐾𝑚𝑛),  O(𝑚𝑛 log 𝑛)) • m:  #  of  users,  n:  #  of  items • Same  order  of  constructing  whole  ratings  matrix • Major  bottleneck  of  the  entire  pipeline • No  easy  way  to  get  around  the  computation 6/16/2015 MyFitnessPal  -­‐ Under  Armour  Connected  Fitness 14
  • 15. Some  Numbers • Spark  cluster • 72  nodes  (1  master  +  71  workers) • 2TB  memory  ⟸ One  of  the  largest  clusters  in  production • Dataset • User  :  85M+ • Item  (food)  :  ~7M • Rating  (food  log  counts):  6.5B+  (aggregated  per  user/food) • Time • ALS  model  training:  4  hours   • Generating  top  K  food  recommendation  for  every  user:  48   hours • More  than  20x speed  improvement  over  Mahout  in   conventional  Hadoop  cluster 6/16/2015 MyFitnessPal  -­‐ Under  Armour  Connected  Fitness 15
  • 16. Advantages  Using  Spark • Faster  development  cycle • MLLib • Parallelization  provided  via  RDD  with  abstraction • Easy  to  construct  data  pipeline  with  DataFrame • Easy  to  load  /  export  data  in  and  out  of  S3  /  Redshift • Faster  model  optimization • In-­‐memory,  distributed  computation • Faster  model  training  /  testing   • Significant  reduction  in  parameter  tuning  /  optimization  on   validation  dataset • Easy  scalability • By  launching  more  worker  instances • Enables  frequent  model  updates   • Reflect  user  preference  change  more  often 6/16/2015 MyFitnessPal  -­‐ Under  Armour  Connected  Fitness 16
  • 17. Sample  Food  Recommendation 6/16/2015 MyFitnessPal  -­‐ Under  Armour  Connected  Fitness Recommended  Foods Cooked  White  Jasmine  Rice Steamed  White  Rice   (Unenriched) Pho Kimchi Tofu  -­‐ Fried Miso  Soup  With  Seaweed  and   Tofu Shrimp  Dumplings Miso  Soup Salmon  Nigiri Sunny  Side  Up Logged  Foods Korean  soy  milk  with  high   calcium Coke  12oz Fried  rice Korean  mixed  grain  shake Korean  Rice  Cake Sweet  Soy  Milk Blackberries  -­‐ Raw Blueberries  -­‐ Raw Korean  Melon  (Chameh /  참외) Japchae (Korean  Stir-­‐Fried   Sweet  Potato  Noodles) 17
  • 18. Extension:  Recipe  Recommendation • Recipes  are  not  “public” • Currently,  only  foods  are  shared  across   different  users • Recipes  are  “private” to  individual  user   when  created • Cannot  construct  standard  user-­‐item   ratings  matrix • Solution:  Recommend  recipes  using   similarity  with  foods 6/16/2015 MyFitnessPal  -­‐ Under  Armour  Connected  Fitness • Advantages  of  recommending  recipes • Richer  metadata  (instructions,  ingredients,  cuisines,  …) • Complete  food,  home-­‐cookable • Customizable  with  personal  preference/restriction 18
  • 19. Extension:  Recipe  Recommendation Logged  foods  data  (user  /  food) Predict  food  preference  by  matrix  factorization Generate  top  K  food recommendation Food  – Recipe  similarity  computation  (LSH,   kNN,  by  nutrition) Final  top  K  recipes  from food  recommendation   rank  and  recipe  similarity  rank 6/16/2015 MyFitnessPal  -­‐ Under  Armour  Connected  Fitness Food   Recom menda tion Recipe Recom menda tion 19
  • 20. Nearest  Neighbor  with  Locality  Sensitive  Hashing • Naïve  nearest  neighbor  computation:  O(n2)   • Nearly  impossible  with  7M  foods  and  38M  recipes • Locality  Sensitive  Hashing  (LSH) • Hashes  similar  items  into  the  same  buckets  with  high   probability • Similarity  metric:  Euclidean  distance  on  nutrition  vector • NN  computation  much  faster,  look  up  within  the  bucket:   O(k2)  (k:  max  size  of  bucket) • Spark  implementation • https://github.com/mrsqueeze/spark-­‐hash 6/16/2015 MyFitnessPal  -­‐ Under  Armour  Connected  Fitness 20
  • 21. Top  K  Recipe  Recommendation • Order  by  sum  of  food  recommendation  rank  and  recipe  similarity  rank 6/16/2015 MyFitnessPal  -­‐ Under  Armour  Connected  Fitness Rank Food 1 Banana 2 Blueberry 3 Apple 4 Pad  Thai 5 … 6 … Rank Recipe 1 Banana  Pie 2 Banana  Smoothie 3 Banana  Ice  Cream Rank Recipe 1 Blueberry Yogurt 2 Blueberry  Pie 3 Salad with  Blueberry … Combin ed  Rank Recipe 2 Banana  Pie 3 Banana   Smoothie 3 Blueberry  Yogurt 4 Banana  Ice   Cream 4 Blueberry  Pie 5 … 21
  • 22. Sample  Recipe  Recommendations Recommended  Recipes Noodle  sauce Stone  Ground   Dijon  Mustard   Marinade Citrus  Dijon  Miso  Dressing Shirataki Noodle  Soup dumpling   sauce Dijon  Miracle  Whip Paleo  Vanilla  Ice-­‐Crème Mable's  Chili  Burrito cake  batter  milkshake Pita  pizza 6/16/2015 MyFitnessPal  -­‐ Under  Armour  Connected  Fitness Recommended  Foods Cooked  White  Jasmine  Rice Steamed  White  Rice   (Unenriched) Pho Kimchi Tofu  -­‐ Fried Miso  Soup  With  Seaweed  and   Tofu Shrimp  Dumplings Miso  Soup Salmon  Nigiri Sunny  Side  Up 22
  • 23. Integrating  with  Taste  Profiles • Machine  learning  classifier  that  outputs  probability   distribution  over  6  taste  categories • Savory,  Sweet,  Sour,  Spicy,  Salty,  Bitter • NN  classifier  performed  over  feature  vector  (semantic   word  vector  +  numeric  nutritional  value  vector) • With  small  number  (~1400)  of  labeled  foods • 87%  accuracy  on  separately  labeled  test  set  (~1200) • Works  as  additional  metadata  for  foods • Recommendation  results  can  be  further  filtered  /   reordered  by  personal  taste  preferences • With  Spark,  data  integration  with  hash-­‐join  is  much   faster 6/16/2015 MyFitnessPal  -­‐ Under  Armour  Connected  Fitness 23
  • 24. Problems • Cold-­‐start  problem • New  user  (important) • New  food  item  (not  too  much,  since  they  may  not  be   popular) • Solution • Hybrid  with  content-­‐based  recommendation • Construct  basic  profile  for  new  users  to  get  baseline   recommendation • May  recommend  new  items  based  on  feature  similarity • Runs  the  pipeline  more  often • For  new  users,  recommend  top/popular  foods • After  a  while,  these  users  will  get  recommendation 6/16/2015 MyFitnessPal  -­‐ Under  Armour  Connected  Fitness 24
  • 25. Possible  Extension • Collaborative  filtering  over   aggregated  food  clusters • User-­‐generated  foods  contain  (near)   duplicates • Combine  with  verified  food  project   • Spark-­‐based  data  deduplication  /   processing  pipeline • Construct  “verified”  foods  out  of   thorough  clustering • Recommend  users  with  high-­‐quality   food • Recommend  representative  food  from   each  cluster,  instead  of  individual   variation • Reducing  the  computation  time  due   to  reduced  dimension 6/16/2015 MyFitnessPal  -­‐ Under  Armour  Connected  Fitness 25
  • 26. Application • Recommend  frequently  paired   foods • Pairing  foods  within  a  single  meal   depends  on • Individual  user’s  own  preference • Cultural  difference  (region,   country) • Simple  way • Suggest  popular  foods  based  on   co-­‐occurrence  stats  per  individual   user  /  overall  users • Utilize  this  framework  to  capture   better  personalized  preference 6/16/2015 MyFitnessPal  -­‐ Under  Armour  Connected  Fitness 26
  • 27. Summary • Spark-­‐powered  machine  learning  pipeline  for   food/recipe  recommendation  system • Faster  computation  help  reduce  the  time  on   development  cycle • Help  data  scientists  focus  on  core  problems • Easy  extension  by  attaching  additional  data  processing   steps  with  scalability • Only  scratched  a  surface • Food  /  recipe  recommendation • Extensions  with  other  data  sources • Workout • Music • Retail • e-­‐Commerce   6/16/2015 MyFitnessPal  -­‐ Under  Armour  Connected  Fitness 27
  • 28. Questions? 6/16/2015 MyFitnessPal  -­‐ Under  Armour  Connected  Fitness 28