Anzeige

NS-CUK Seminar: J.H.Lee, Review on "Scaling Law for Recommendation Models: Towards General-purpose User Representations", AAAI 2023

30. Mar 2023
Anzeige

Más contenido relacionado

Similar a NS-CUK Seminar: J.H.Lee, Review on "Scaling Law for Recommendation Models: Towards General-purpose User Representations", AAAI 2023(20)

Más de ssuser4b1f48(20)

Anzeige

NS-CUK Seminar: J.H.Lee, Review on "Scaling Law for Recommendation Models: Towards General-purpose User Representations", AAAI 2023

  1. Jooho-Lee School of Computer Science and Information Engineering, The Catholic University of Korea E-mail: jooho414@gmail.com 2023-03-16
  2. 1  Introduction • Problem Statement • Contribution  Method  Experiment  Conclusion
  3. 2 Introduction Problem Statement • Recent advancement of large-scale pretrained models such as BERT, GPT-3, CLIP, and Gopher, has shown astonishing achievements across various task domains • Unlike vision recognition and language models, studies on general-purpose user representation at scale still remain underexplored
  4. 3 Introduction Problem Statement • Can general-purpose user representations learned from multiple source data provide a promising transfer learning capacity? • Are pretraining and downstream task performances positively correlated? • How various tasks can the pretrained user representations address? • Does scaling up the pretraining model improve the generalization performance? • If so, which factors, such as, training data and model size, behavior sequence length and batch size, should be scaled up?
  5. 4 Introduction Contribution • Empirical scaling law • Transferability improves as the pretraining error decrease • Transforming tabular data to natural language text provides common semantic representation • Advantages of training from multiple service logs
  6. 5 Introduction Contribution • Empirical scaling law • Transferability improves as the pretraining error decrease • Transforming tabular data to natural language text provides common semantic representation • Advantages of training from multiple service logs
  7. 6 Method CLUE : Model Architecture
  8. 7 Method CLUE : Model Architecture
  9. 8 Method Transferability improves as the pretraining error decrease • heterogeneous dataset (multi-domain) • The loss of pretrain task and downstream task has a strong correlation  It’s good for downstream  Pretraining is general in various data distributions
  10. 9 Method Transforming tabular data to natural language text provides common semantic representation • they transform all data into natural language texts by extracting textual information from tabular data (e.g., product descriptions from product data table) • This policy alleviates the discrepancies in data format within different services; the data format of the same product varies depending on the platform
  11. 10 Method Advantages for training from multiple service logs • CLUE learns a multi-modal user embedding space from two services, and shows promising results on diverse downstream tasks
  12. 11 Experiment Advantages for training from multiple service logs • CLUE learns a multi-modal user embedding space from two services, and shows promising results on diverse downstream tasks
  13. 12 Method Loss function • the loss of CLIP • symmetric cross-entropy
  14. 13 Experiment Downstream Task • PCR : e-commerce • FWR : web-based cartoon • NVR : news • MMR : marketing messages • OTAR : online travel agency • ICLT : Inter-Company-Level Transfer
  15. 14 Experiment
  16. 15 Experiment
  17. 16 Method Scaling Law and Generalization Scaling Laws for Neural Language Models (arxiv, OpenAI) • OpenAI presented the basis for the introduction of the Large Model in earnest, a paper published in early 2020.
  18. 17 Method Scaling Law and Generalization • Model architecture • Model size • Computing power required for learning • Learning Dataset Size
  19. 18 Method Scaling Law and Generalization • power law relation? ex) The number of cities with a specific population number appears in inverse proportion to the power of the population number (멱급수) → A law that determines the factor in various models. → Unlike LM, factors such as batch size and sequence length were important in the user presentation task.
  20. 19 Method Scaling Law and Generalization
  21. 20 Method Scaling Law and Generalization
  22. 21 Method Scaling Law and Generalization
  23. 22 Method Scaling Law and Generalization
  24. 23 Conclusion • The success of task-agnostic pretraining in other domain in valid in user representation • CLUE trained on the billion scale real-world user behavior data to learn general-purpose user representations • They further investigate the empirical scaling laws and the generalization ability of our method, and find that the power-law learning curve as a function of computation (PF-days) is observed in the experiments

Hinweis der Redaktion

  1. 이미 이전에 propagation을 활용하여 rumor detection 하는 논문들을 모두 봤었다.
  2. 이미 이전에 propagation을 활용하여 rumor detection 하는 논문들을 모두 봤었다.
  3. 이미 이전에 propagation을 활용하여 rumor detection 하는 논문들을 모두 봤었다.
  4. 이미 이전에 propagation을 활용하여 rumor detection 하는 논문들을 모두 봤었다.
  5. 이미 이전에 propagation을 활용하여 rumor detection 하는 논문들을 모두 봤었다.
  6. 이미 이전에 propagation을 활용하여 rumor detection 하는 논문들을 모두 봤었다.
  7. 이미 이전에 propagation을 활용하여 rumor detection 하는 논문들을 모두 봤었다.
  8. 이미 이전에 propagation을 활용하여 rumor detection 하는 논문들을 모두 봤었다.
  9. 이미 이전에 propagation을 활용하여 rumor detection 하는 논문들을 모두 봤었다.
  10. 이미 이전에 propagation을 활용하여 rumor detection 하는 논문들을 모두 봤었다.
  11. 이미 이전에 propagation을 활용하여 rumor detection 하는 논문들을 모두 봤었다.
  12. 이미 이전에 propagation을 활용하여 rumor detection 하는 논문들을 모두 봤었다.
  13. 이미 이전에 propagation을 활용하여 rumor detection 하는 논문들을 모두 봤었다.
  14. 이미 이전에 propagation을 활용하여 rumor detection 하는 논문들을 모두 봤었다.
  15. 이미 이전에 propagation을 활용하여 rumor detection 하는 논문들을 모두 봤었다.
  16. 이미 이전에 propagation을 활용하여 rumor detection 하는 논문들을 모두 봤었다.
  17. 이미 이전에 propagation을 활용하여 rumor detection 하는 논문들을 모두 봤었다.
  18. 이미 이전에 propagation을 활용하여 rumor detection 하는 논문들을 모두 봤었다.
  19. 이미 이전에 propagation을 활용하여 rumor detection 하는 논문들을 모두 봤었다.
  20. 이미 이전에 propagation을 활용하여 rumor detection 하는 논문들을 모두 봤었다.
  21. 이미 이전에 propagation을 활용하여 rumor detection 하는 논문들을 모두 봤었다.
  22. 이미 이전에 propagation을 활용하여 rumor detection 하는 논문들을 모두 봤었다.
Anzeige