4. Limitation of LightGCN
4
Limitation 1
Weight 𝛼𝑖𝑘, 𝛼𝑢𝑣 is not reasonable due to factors of users/items are asymmetric
𝛼𝑖𝑘 =
1
𝑑𝑢 + 1 𝑑𝑘 + 1(𝑑𝑖 + 1)
Limitation 2
Message passing combine various relationships via stacked layers
Due to stacking problematic information, affect negative impact on result
weight for modeling item-item relationship Different weight on target item 𝑖 & interacted item 𝑘
Need to adjust weight(importance) of various relationship
5. Limitation of LightGCN
5
Limitation 3
Stacking more layers → Capture higher-order collaborative signals
LightGCN performs best with 2~3 layers → Over-smoothing problem may occur
From Theorem 1 in GCNⅡ[1], infinite powers of message passing can derived
lim
𝑙→∞
(𝐷−
1
2𝐴𝐷−
1
2)𝑖,𝑗
𝑙
=
𝑑𝑖 + 1 𝑑𝑗 + 1
2𝑚 + 𝑛
[1] Simple and Deep Graph Convolutional Networks (ICML’20)
Motivation: Removing explicit message passing!
6. Proposed method: UltraGCN
6
Figure 1: UltraGCN: Ultra Simplification of Graph Convolutional Networks for Recommendation (CIKM’21)
Remove explicit message passing
Directly approximate such convergence state
𝑒𝑖 = lim
𝑛→∞
𝑒𝑖
(𝑙+1)
= lim
𝑛→∞
𝑒𝑖
(𝑙)
Link prediction in graph
8. Learning on user-item graph
8
Typical prediction on recommender system → Link prediction on graph
Possible loss: Pairwise BPR vs Pointwise BCE
𝐿𝑂 = −
𝑢,𝑖 ∈𝑁+
log 𝜎 𝑒𝑢
⊺ 𝑒𝑖 −
𝑢,𝑗 ∈𝑁−
log 𝜎 −𝑒𝑢
⊺ 𝑒𝑗
𝐿 = 𝐿𝑂 + 𝜆𝐿𝐶
Above loss depends on user-item graphs(UltraGCNBase)
9. Learning on item-item graph
9
Limitation 2: Need to adjust weight(importance) of various relationship
UltraGCN does not use explicit message passing → Can adjust weight on various relationship flexibly
Item-Item co-occurrence graph is useful for recommendation[1]
1. Build item-item co-occurrence graph 𝐺 ∈ ℝ 𝐼 ×|𝐼| = 𝐴⊺𝐴
2. Do the same thing for approximate infinite state as similar to deriving 𝛽𝑢,𝑖, with item-item co-occurrence graph 𝐺
𝑒𝑖 =
𝑗∈𝑁𝐺(𝑖)
𝜔𝑖,𝑗𝑒𝑗, where 𝜔𝑖,𝑗 =
𝐺𝑖,𝑗
𝑔𝑖 − 𝐺𝑖,𝑖
𝑔𝑖
𝑔𝑗
, 𝑔𝑖 =
𝑘
𝐺𝑖,𝑘
[1]: M2GRL: A Multi-task Multi-view Graph Representation Learning Framework for Web-scale Recommender Systems (KDD’20 ads track, oral)
Rather than using all 𝑗 ∈ 𝑁𝐺(𝑖), select top-K most similar items 𝑆(𝑖) based on 𝜔𝑖,𝑗 for training
10. Learning on item-item graph & Final loss
10
On the item-item graph, proper representation of item 𝑖
𝑒𝑖 =
𝑗∈𝑆(𝑖)
𝜔𝑖,𝑗𝑒𝑗
For positive pair 𝑢, 𝑖 ∈ 𝑁+, BCE loss with infinite state of 𝑒𝑖
𝐿𝐼 =
𝑢,𝑖 ∈𝑁+ 𝑗∈𝑆(𝑖)
𝜔𝑖,𝑗 log 𝜎 𝑒𝑢
⊺
𝑒𝑗 + 𝑛𝑒𝑔. 𝑠𝑎𝑚𝑝𝑙𝑖𝑛𝑔
Final loss
𝐿 = 𝐿𝑂 + 𝜆𝐿𝐶 + 𝛾𝐿𝐼
𝐿𝐶, 𝐿𝐼
𝐿𝑂
13. Experiments – Ablation study
13
Checklists
1. Are each part of UltraGCN effective?
2. Training user-item pair on item-item co-occurrence graph is better than training item-item pair?
3. Why not use user-user co-occurrence graph?
𝐿𝐼
′
=
𝑢,𝑖 ∈𝑁+ 𝑗∈𝑆(𝑖)
𝜔𝑖,𝑗 log 𝜎 𝑒𝑖
⊺
𝑒𝑗
14. Experiments – Ablation study
14
Checklists
1. Are each part of UltraGCN effective?
2. Training user-item pair on item-item co-occurrence graph is better than training item-item pair?
3. Why not use user-user co-occurrence graph?
Hinweis der Redaktion
User와 이미 interaction이 있는 item k, predict 예상인 item i를 다르게 취급하는게 맞는가?
이런 것들이 쌓일텐데 그 결과가 옳은 결과인가?
원 논문에는 spectral gap에 의해 +- alpha가 붙는다고 하는데 여기서는 날려버림
*Spectral gap: difference between the moduli(abs) of the two largest eigenvalues of a matrix(wikipedia)
Message passing에 문제가 있다 + Inifinite power의 limit이 존재한다
-> Inifinite layer message passing을 스킵하고, 적당한 convergence state를 approximate하면 안되나?
Positive pair만 가지고 학습시킬 경우 over-smoothing 현상이 또 발생할 수 있음
- 기존 GCN-based model들은 layer 개수를 줄여서 해결
UltraGCN은 limit of infinite-layer message passing을 approximate했기 때문에, negative sampling을 사용함
결국, 생겨먹은 건 weighted MF 형식…?!
User의 interest가 item이 가지는 어떤 properties보다 더 넓어서, user-user graph user간 relationship을 잘 capture하기 힘들다?
- 그래서 넣어도 performance에 영향이 적다