Abstract: In this paper, I will explain the paper Progressive Layered Extraction (PLE): A Novel Multi-Task Learning (MTL) Model for Personalized Recommendations which won the Best Long Paper award in RecSys 2020.
MTL relates to a challenge where we want to learn different tasks eg. predicting cat breeds and dog breeds using a single DNN. The basic idea in this type of DNN is to share lower level features across tasks such that the model learns some general features. However, this DNN would not work for a task that is unrelated and does not share the same features, and in this context such as predicting a car model.
Related to this in the field of Recommender Systems we would like to learn different tasks such as the likelihood of clicking, finishing, sharing, favoriting, commenting, etc. Some of these tasks are often loosely correlated or conflicted which may lead to negative transfer. Scenarios appear where MTL models improve certain tasks by degrading the performance of other tasks (seesaw phenomenon). Some related works like cross-stitch networks, sluice networks, multi-gate mixture of experts address this problem in various ways.
This idea behind this paper is to explicitly separate shared and task-specific experts to avoid harmful parameter interference. On top of this multi-level experts and gating networks are introduced to fuse more abstract representations. Finally, it adopts a novel progressive separation routing to model interactions between experts and achieve more efficient knowledge transferring between complicatedly correlated tasks.
Speaker info: Vaibhav Singh currently heads machine learning work in areas of Fraud Detection, App Personalization and Consumer Growth within Klarna.
2. Who am I
• Name Pronunciation: y bhav
• Currently Head Machine Learning in Klarna and focus on Fraud, Shopping App
Recommendations and Consumer Growth
• Past Machine Learning Experience in
• Large Scale Image/Ads Moderation
• Credit Risk for P2P Lending
• Moved from Software Engineering to Machine Learning
3. What are we
learning today ?
● Multi Task Learning
● Mixture of Experts
● MTL in Recommendation Systems
● PLE and CGC in MTL
5. Image Source: KDD2018 video. (2018). Modeling Task Relationships in Multi-task Learning with Multi-gate Mixture-of-Experts [YouTube Video].
Retrieved from https://www.youtube.com/watch?v=Dweg47Tswxw
6. Image Source: KDD2018 video. (2018). Modeling Task Relationships in Multi-task Learning with Multi-gate Mixture-of-Experts [YouTube Video].
Retrieved from https://www.youtube.com/watch?v=Dweg47Tswxw
7. Image Source: KDD2018 video. (2018). Modeling Task Relationships in Multi-task Learning with Multi-gate Mixture-of-Experts [YouTube Video].
Retrieved from https://www.youtube.com/watch?v=Dweg47Tswxw
8. Current Challenges in MTL
• Uncorrelated features
• Performance of the network might be affected due to
unrelated features
• Negative Transfer
• Mitigated by multi-gating networks - MMoE - from Google
• Seesaw Phenomenon
• Mitigated by CGC and PLE - from Tencent
10. Mixture of Experts
Image Source: “Lecture 38 Mixture of Experts Neural Network.” SlideServe, 14 Mar. 2019,
www.slideserve.com/quincy-morrow/lecture-38-mixture-of-experts-neural-network-powerpoint-ppt-presentation. Accessed 2 Dec. 2020.
11. Image Source: Ma, Jiaqi, et al. “Modeling Task Relationships in Multi-Task Learning with Multi-Gate Mixture-of-Experts.” Proceedings of the 24th ACM SIGKDD International Conference Knowledge
Discovery & Data Mining, 19 July 2018, 10.1145/3219819.3220007. Accessed 25 Nov. 2020.
12. Image Source: Tang, Hongyan, et al. “Progressive Layered Extraction (PLE): A Novel Multi-Task Learning (MTL) Model for Personalized Recommendations.” Fourteenth ACM Conference
on Recommender Systems, 22 Sept. 2020, 10.1145/3383313.3412236. Accessed 25 Nov. 2020.
Single Level MTL Models
13. Image Source: Tang, Hongyan, et al. “Progressive Layered Extraction (PLE): A Novel Multi-Task Learning (MTL) Model for Personalized Recommendations.” Fourteenth ACM Conference
on Recommender Systems, 22 Sept. 2020, 10.1145/3383313.3412236. Accessed 25 Nov. 2020.
19. Customized Gate Control
Image Source: Tang, Hongyan, et al. “Progressive Layered Extraction (PLE): A Novel Multi-Task Learning (MTL) Model for Personalized Recommendations.” Fourteenth ACM Conference
on Recommender Systems, 22 Sept. 2020, 10.1145/3383313.3412236. Accessed 25 Nov. 2020.
20. CGC - Customized Gate Control
● Explicitly separate shared and task specific layers
● Shared experts and task-specific experts are combined through a gating network for selective fusion.
● Output of task k’s gating network is formulated
● wk
(x) is a weighting function to calculate the weight vector of task k through linear transformation and a
SoftMax layer
● Sk
(x) is a selected matrix composed of all selected vectors including shared experts and task
● Prediction of task k. tk
denotes the tower network of task k
21. PLE - Progressive Layered Extraction
Image Source: Tang, Hongyan, et al. “Progressive Layered Extraction (PLE): A Novel Multi-Task Learning (MTL) Model for Personalized Recommendations.” Fourteenth ACM Conference
on Recommender Systems, 22 Sept. 2020, 10.1145/3383313.3412236. Accessed 25 Nov. 2020.
23. Loss function for MTL
● Weighted sum of the losses for each individual task
● MTL Loss in practice for Recommendation Systems
○ To train these tasks jointly, we consider the union of sample space of all tasks as the whole
training set, and ignore samples out of its own sample space when calculating the loss of each
individual task.
○ Where lossk
is task k’s loss of sample i calculated based on prediction yˆk
i
and ground truth yk
i
,
δk
i
∈ {0,1} indicates whether sample i lies in the sample space of task k
○ Finally loss weights for each task is updated every epoch.
Image Source: Tang, Hongyan, et al. “Progressive Layered Extraction (PLE): A Novel Multi-Task Learning (MTL) Model for Personalized Recommendations.” Fourteenth ACM Conference
on Recommender Systems, 22 Sept. 2020, 10.1145/3383313.3412236. Accessed 25 Nov. 2020.
24. Links and references
1. Modeling Task Relationships in Multi-task Learning with Multi-gate Mixture-of-Experts MMoE. LINK
2. Progressive Layered Extraction (PLE): A Novel Multi-Task Learning (MTL) Model for Personalized
Recommendations LINK
3. Lecture 38 Mixture of Experts Neural Network LINK
4. Andrej Karpathy: Tesla Autopilot and Multi-Task Learning for Perception and Prediction VIDEO LINK
5. Andrew Ng Multitask Learning (C3W2L08) VIDEO LINK
6. Keras-MMoE Github