2. Who am I?
2
Streaming Platform Engineer
playboy.com
Streaming Platform Engineer
netflix.com
Data Solutions Engineer
spark.apache.org, databricks.com
3. What is ?
3
Spark Core
Spark
Streaming
real-timeSpark SQL
structured data
MLlib
machine
learning
GraphX
graph
analytics
…
BlinkDB
approx queries
4. What is ?
4
Founded by creators of Spark, largest contributor
Offers a hosted
service
Spark on EC2
Notebooks
Plot visualizations
Cluster management
Scheduled jobs
6. 6
Goal: Generate high-quality recommendations
for its users
Side goal: Demonstrate Spark Libraries
Spark Streaming -> Kafka
Spark SQL -> DataFrames
MLlib -> Machine Learning
GraphX -> Graph Analytics
What is After Dark?
7. Interactive Demo:
Streaming + Spark SQL + DataFrames
7
①Navigate to sparkafterdark.com
②Click 3 actors and 3 actresses
8. 2 Types of Recommendations
8
① Non-personalized
No data about user, yet
“Cold Start” problem
② Personalized
Adapt to user preferences and behavior
Recommend from other users with similar
preferences and behavior
9. Non-personalized Recommendations
9
① Top Users by Like Count
“I might like users who are most-liked overall.”
SparkSQL + DataFrame: Aggregations
② Top Influencers by Like Graph
“I might like users who have the highest probability of
me liking them randomly based on overall likes.”
GraphX: PageRank
11. Personalized Recommendations
11
③ Collaborative Filtering
“I like the same people that you like.
What other people do you like that I haven’t seen?”
MLlib: ALS, User-Item Similarity
④ Text Analytics
“Our profiles have similar, unique keywords.
We might like each other.”
MLlib: RowMatrix, Doc Similarity, TF/IDF
15. More Personalized Recommendations!!
15
④ Eigenfaces
“Your face looks similar to others that I’ve liked.
I might like you.”
MLlib: RowMatrix, PCA, Item-Item Similarity
16. More Personalized Recommendations!!!
16
⑤ Compromise Recommendations (For Couples)
“I want Mad Max. You want Message In a Bottle.
Let’s find something in between.”
MLlib: RowMatrix, Item-Item Similarity
GraphX: Nearest Neighbors, Shortest Path
similar similar
plot actor
17. More Personalized Recommendations!
17
⑥ Text Analytics
“Our profiles have similar, unique keywords.
We might like each other.”
MLlib: RowMatrix, Doc Similarity, TF/IDF
18. More Personalized Recommendations!!!!
18
⑦ High-Value Emails
“Your email has similar keywords to my profile.
I might like you for making the effort.”
MLlib: TF/IDF, Entity Recognition, Doc
Similarity
^
Her Email< My Profile
19. Conversation-Starter Bot
19
⑧ MLlib: TF/IDF, DecisionTree, Sentiment
“If your responses to my trite opening lines are positive,
I might actually read your profile.”
MLlib: TF/IDF, DecisionTree, Sentiment
20. Other Talks
20
① Online Approximate OLAP in SparkSQL
Sameer Agarwal & Kai Zeng, Tues 6/9 @ 3:25pm
② Recipes for Running Streaming Apps in Prod
Tathagata Das, Wed 6/10 @ 12:05pm
③ Practical Distributed ML Pipelines on Hadoop
Joseph Bradley, Wed 6/10 @ 4:35pm
④ Dynamically Allocate Spark Cluster Resources
Andrew Or & Aaron Davidson, Thurs 6/11 @ 11:15am