SlideShare ist ein Scribd-Unternehmen logo
1 von 52
Downloaden Sie, um offline zu lesen
Uncoupled Regression from
Comparison Data
Liyuan Xu
Gatsby Unit@UCL, Former AIP member
(Twitter: @ly9988)
Disclaimer
This talk is mainly based on our paper in NeurIPS2019
Introduction
Regression Problem
(x1, y1), (x2, y2), …
(Coupled) Data
∼ PXY
f(X) ≃ 𝔼[Y|X]
Learn
Correspondence in data is assumed
Uncoupled Regression Problem
Uncoupled Data
∼ PX
x1, x2, x3, …
∼ PY
y1, y2, y3, …
f(X) ≃ 𝔼[Y|X]
Learn
Regression without data correspondence
Uncoupled Regression
Uncoupled regression is impossible itself.
→What is a practically feasible assumption?
Application of Uncoupled Regression
• Merging two datasets [Carpentier+, 2016]
• : income, housing priceX Y :
Government
Publish X
Bank
Publish Y
How to merge two datasets
collected independently?
Application of Uncoupled Regression
• Privacy Preserving Machine Learning [Xu et al. 2019]
• Consider contains sensitive informationY
(Xi, Yi)
Security Incident
Application of Uncoupled Regression
• Privacy Preserving Machine Learning [Xu et al. 2019]
• Consider contains sensitive informationY
Xi Yi
Anonymized Data
Data Fusion / Matching
Uncoupled Data w. Context
∼ PXZ
(x1, z1), (x2, z2), …
∼ PYZ
(y1, z′1), (y2, z′2), …
f(X) ≃ 𝔼[Y|X]
Learn
Use contextual data to merge two distributions
→ Data Fusion / Matching
Z
Isometric Uncoupled Regression [Carpentier+, 2016]
Uncoupled Data
∼ PX
x1, x2, x3, …
∼ PY
y1, y2, y3, …
f(X) ≃ 𝔼[Y|X]
Learn
Assuming
𝔼[Y|X] : monotonic
Monotonicity makes uncoupled regression feasible
Isometric Uncoupled Regression [Carpentier+, 2016]
• Advantage
• Consistency is proved [Rigollet et al. 2018]
→ Optimal model can be learn as data increases
• Limitation
• Monotonicity assumption may be too strong
• Is really income monotonic to housing price ?
• Only applicable to the case
• Need to know the noise distribution
• Solve problem with with known
X Y
X ∈ ℝ
Y = f*(x) + ε P(ε)
High-level concept
Message in [Carpentier+, 2016]
Uncoupled Data + Order Info. → Regression
Order info is provided by monotonic assumption
Our Idea
Order info is learned from pairwise comparison data
Uncoupled Data + Order Info. → Regression
Problem Setting
• Pairwise Comparison Data
• Originally considered in ranking context
• Sample two data points
• Obtain Pairwise Comparison Data as
(X, Y), (X′, Y′) ∼ PX,Y
(X+
, X−
)
{
X+
= X, X−
= X′ (if Y > Y′)
X+
= X′, X−
= X (if Y ≤ Y′)
Uncoupled Regression from Pairwise Comparison
∼ PX
x1, x2, x3, …
∼ PY
y1, y2, y3, …
f(X) ≃ 𝔼[Y|X]
Learn
∼ PX+,X−
(x+
1 , x−
1 ), (x+
2 , x−
2 ), …
Uncoupled Data Pairwise Comparison Data
Uncoupled Regression from Pairwise Comparison
Proposes two approaches:
Risk Approximation & Target Transformation
• Advantage
• Put no assumption on
• Need not to know noise distribution
• Limitation
• Not consistent
• Deviation from optimal model is bounded
• Empirically it works
𝔼[Y|X]
Risk Approximation Approach
Formal Problem Settings
• Data Given:
• Unlabeled Data:
• Target Set:
• Pairwise Comparison Data:
• Goal: Find that satisfies
DX = {x1, x2, …, xn} ∼ PX
DY = {y1, y2, …, yn} ∼ PY
DX+,X− = {(x+
1 , x−
1 ), …, (x+
m, x−
m)} ∼ PX+,X−
f*
f* = arg min
f
R(f ), R(f ) = 𝔼[(f(X) − Y)2
]
Risk Approximation
Loss Decomposition
R(f ) = 𝔼X,Y[(f(X) − Y)2
]
= 𝔼X[f2
(X)] − 2𝔼X,Y[Yf(X)] + const .
Estimated from unlabeled data DX
Approx. by
linear combination of and
𝔼X,Y[Yf(X)]
𝔼X+[f(X+
)] 𝔼X−[f(X−
)]
Risk Approximation
Lemma 1 [Xu et al. 2019]
For any function ,f
𝔼X+[f(X+
)] = 2𝔼X,Y[FY(Y)f(X)]
𝔼X−[f(X−
)] = 2𝔼X,Y[(1 − FY(Y))f(X)],
where is CDF ofFY Y
If we can learn such thatw1, w2
Y ≃ 2w1FY(Y) + 2w2(1 − FY(Y))
then,
𝔼XY[Yf(X)] ≃ w1 𝔼X+[f(X+
)] + w2 𝔼X−[f(X−
)]
Risk Approximation
• Risk Approximation
• Step1: Estimate CDF
• Step2: Learn weights for loss
• Step3: Learn model
̂FY
̂w1, ̂w2
̂f
Risk Approximation
• Risk Approximation
• Step1: Estimate CDF
• Step2: Learn weights for loss
• Step3: Learn model
̂FY
̂w1, ̂w2
̂f
CDF is estimated viaFY
Risk Approximation
• Risk Approximation
• Step1: Estimate CDF
• Step2: Learn weights for loss
• Step3: Learn model
̂FY
̂w1, ̂w2
̂f
Weight is learned bŷw1, ̂w2
̂w1, ̂w2 = arg min
|DY|
∑
i=1
(yi − 2w1
̂FY(yi) − 2w2(1 − ̂FY(yi)))
2
Recall, we want Y ≃ 2w1FY(Y) + 2w2(1 − FY(Y))
Risk Approximation
• Risk Approximation
• Step1: Estimate CDF
• Step2: Learn weights for loss
• Step3: Learn model
̂FY
̂w1, ̂w2
̂f
Model is learned byf
̂f = arg min
f
1
|DX |
|DX|
∑
i=1
f(xi)2
−
2
|DX+,X− |
|DX+,X−|
∑
j=1
̂w1f(x+
j ) + ̂w2 f(x−
j )
𝔼X[f2
(X)] 2𝔼XY[Yf(X)]
Theoretical Property
Theorem 2 [Xu et al. 2019]
For learned , with some assumption,̂f
R( ̂f ) ≤ R(f*) + Op
(
1
|DX |1/2
+
1
|DX−,X+ |1/2 )
+ M Err( ̂w1, ̂w2)
Here, is the approximation errorErr(w1, w2)
Err(w1, w2) = 𝔼Y[(Y − 2w1FY(Y) − 2w2(1 − FY(Y)))2
]
→ Approximate loss well, small bias in the model
Theoretical Property
Theorem 2 [Xu et al. 2019]
For learned , with some assumption,̂f
Especially, if thenY ∼ Unif[a, b] Err(b/2,a/2) = 0
R( ̂f ) ≤ R(f*) + Op
(
1
|DX |1/2
+
1
|DX−,X+ |1/2 )
+ M Err( ̂w1, ̂w2)
Theoretical Property
Theorem 2 [Xu et al. 2019]
For learned , with some assumption,̂f
In general,
① Theoretically, it’s inevitable…
② Empirically it works!
Err > 0
R( ̂f ) ≤ R(f*) + Op
(
1
|DX |1/2
+
1
|DX−,X+ |1/2 )
+ M Err( ̂w1, ̂w2)
Theoretical Property
There exists two distributions
that cannot distinguished by PX, PY, PX+,X−
Theoretical Property
PXY
X
Y
˜PXY
X
Y
1/6
1/8 5/24
1/4
1/8 5/24
1/6
1/6
1/6
1/6
1/6
1/12
Same , , butPX PY, PX+,X− 𝔼P[Y|X] ≠ 𝔼 ˜P[Y|X]
Empirical Result
• Learn a linear model in UCI datasets
• Uncoupled regression
• Use all features for , all targets for
• Note, no correspondence is given
• Generate 5000 pairs of
• Supervised regression
• Use entire coupled data
DX DY
DX+,X−
(X, Y)
Empirical Result
• MSE of linear models in UCI datasets
→ Can yield almost same MSE as supervised learning !
Conclusion So Far
• Uncoupled Regression From Pairwise Comparison
• Solve regression problem given
• Unlabeled data
• Set of target value
• Pairwise comparison data
• Introduced approach based on risk approximation
• Theoretical and empirical results are given
DX
DY
DX+,X−
Modeling CDF
from Pairwise Comparison Data
Theoretical Property (Recap)
Theorem 2 [Xu et al. 2019]
For learned , with some assumption,̂f
Especially, if then
→ We can learn optimal
Y ∼ Unif[a, b] Err(b/2,a/2) = 0
Y
R( ̂f ) ≤ R(f*) + Op
(
1
|DX |1/2
+
1
|DX−,X+ |1/2 )
+ M Err( ̂w1, ̂w2)
Predicting Percentile
• Optimize Direct Marketing
• : Customer Feature, : Probability of Purchase
• Send discount tickets to 1% of potential customers
• CDF is more the target of interest than
• Predicting might not be a best idea…
• Due to class imbalance, all can be very small
X Y
FY(Y) Y
Y
Y
Predicting Percentile
• Sometimes percentile is the target of interest
• Learn that minimizes
• follows
→We can learn optimal from pairwise comparison
f(X)
R(f ) = 𝔼[(FY(Y) − f(X))2
]
FY(Y) Unif[0,1]
f
Motivating Example for Predicting Percentile
• Online Chess Rating
• : User attributes, : Abstract measure of “Skill”
• Skill is compared by game
• Pairwise comparison data given in nature
• Want to know the percentile in skill ranking
X Y
Simple Solution
• Problem (Recap)
• Given pairwise comparison data
• Predict conditional expectation of CDF
• Simple Solution
• Learn ranking model from
• Transform to
(X+
, X−
)
𝔼[FY(Y)|X]
r(X) (X+
, X−
)
r(X) 𝔼[FY(Y)|X]
Pairwise-Ranking based Approach
• Pairwise Learn to Rank
• Learn ranker which minimizes rank loss
• e.g. SVMRank, RankBoost
• Given test data and rank model,
r(X)
Xtest
𝔼[FY(Y)|X] ≃
Rank of Xtest in entire data
Number of entire data
Weakness in Pairwise-Ranking based Approach
• Original Goal is to minimize
,
• Rank model minimizes
Small does not necessary mean small
→We aim for directly minimizing
R(f ) = 𝔼X,Y[(f(X) − FY(Y))2
]
r(X)
Rr(r) R(f )
R(f )
Direct Minimization
Lemma 1 [Xu et al. 2019]
For any function ,h
𝔼X+[h(X+
)] = 2𝔼X,Y[FY(Y)h(X)]
𝔼X−[h(X−
)] = 2𝔼X,Y[(1 − FY(Y))h(X)]
From this lemma, we have
R(f ) = 𝔼X,Y[(f(X) − FY(Y))2
]
= 𝔼X[f2
(X)] −2𝔼X,Y[FY(Y)f(X)] +const .
= 𝔼X[f2
(X)] −𝔼X+[f(X+
)] +const .
R(f ) ≤ ̂R(f ) + Op
1
|DX |
+
1
|DX+,X− |
Empirical Approximation
• The original loss (without constant)
• The empirical loss
R(f )
R(f ) = 𝔼X[f2
(X)] − 𝔼X+[f(X+
)]
̂R(f )
̂R(f ) =
1
|DX | ∑
DX
f2
(xi) −
1
|DX+,X− | ∑
DX+,X−
f(x+
i )
Summary
• Summary
• We can learn only from
• Empirical loss to minimize is
Can we use this to original regression problem?
𝔼[FY(Y)|X] DX, DX+,X−
̂R(f ) =
1
|DX | ∑
DX
f2
(xi) −
1
|DX+,X− | ∑
DX+,X−
f(x+
i )
Target Transform Approach
Target Transformation
• From previous discussion,
• We can learn optimal model for
• We can learn CDF function .
• Target Transformation Approach [Xu et al. 2019]
1. Learn function minimizes
2. Output regression model as
FY(Y)
FY
̂F
RF(F) = 𝔼X,Y[(FY(Y) − F(X))2
]
̂f
̂f = F(−1)
Y
(F(X))
Target Transformation
• Target Transformation
• Step1: Estimate CDF
• Step2: Learn CDF model
• Step3: Learn regression model
̂FY
̂F
̂f
Target Transformation
• Target Transformation
• Step1: Estimate CDF
• Step2: Learn CDF model
• Step3: Learn regression model
̂FY
̂F
̂f
CDF is estimated viaFY
Target Transformation
• Target Transformation
• Step1: Estimate CDF
• Step2: Learn CDF model
• Step3: Learn regression model
̂FY
̂F
̂f
Model is learned bŷF
̂F = arg min
F
1
|DX |
|DX|
∑
i=1
F(xi)2
−
1
|DX+,X− |
|DX+,X−|
∑
j=1
F(x+
j )
𝔼X[f2
(X)] 2𝔼XY[FY(Y)f(X)]
Target Transformation
• Target Transformation
• Step1: Estimate CDF
• Step2: Learn CDF model
• Step3: Learn regression model
̂FY
̂F
̂f
Model is learned byf
̂f = F−1
Y ( ̂F(X))
Experiment on UCI
• RA: Risk Approximation
• TT: Target Transformation
• SVMRank: TT approach with is learned based on SVMRank̂F
Conclusion
• Uncoupled Regression From Pairwise Comparison
• Solve regression problem given
• Unlabeled data
• Set of target value
• Pairwise comparison data
• Approach based on risk approximation
• Theoretical and empirical results are given
• Approach based on target transformation
• (Theoretical) and empirical results are given
DX
DY
DX+,X−
Thank you!
• Follow me on Twitter! (@ly9988)

Weitere ähnliche Inhalte

Was ist angesagt?

Distributed Database Management System
Distributed Database Management SystemDistributed Database Management System
Distributed Database Management SystemHardik Patil
 
Disk management server
Disk management serverDisk management server
Disk management serveranilinvns
 
Pass Your Private Pilot Checkride PDF - Jason Schappert Your FAA Checkride Ex...
Pass Your Private Pilot Checkride PDF - Jason Schappert Your FAA Checkride Ex...Pass Your Private Pilot Checkride PDF - Jason Schappert Your FAA Checkride Ex...
Pass Your Private Pilot Checkride PDF - Jason Schappert Your FAA Checkride Ex...lypuwozi41822
 
LUNA-88K, emulators, and me
LUNA-88K, emulators, and meLUNA-88K, emulators, and me
LUNA-88K, emulators, and meKenji Aoyama
 
Bottom half in linux kernel
Bottom half in linux kernelBottom half in linux kernel
Bottom half in linux kernelKrishnaPrasad630
 
Analyse d'un kernel (crash, core) dump
Analyse d'un kernel (crash, core) dumpAnalyse d'un kernel (crash, core) dump
Analyse d'un kernel (crash, core) dumpGaëtan Trellu
 
Persistent Memory Development Kit (PMDK) Essentials: Part 1
Persistent Memory Development Kit (PMDK) Essentials: Part 1Persistent Memory Development Kit (PMDK) Essentials: Part 1
Persistent Memory Development Kit (PMDK) Essentials: Part 1Intel® Software
 
仮想マシンにおけるメモリ管理
仮想マシンにおけるメモリ管理仮想マシンにおけるメモリ管理
仮想マシンにおけるメモリ管理Akari Asai
 
Memory Hierarchy (RAM and ROM)
Memory Hierarchy (RAM and ROM)Memory Hierarchy (RAM and ROM)
Memory Hierarchy (RAM and ROM)sumanth ch
 
Memory management
Memory managementMemory management
Memory managementImran Khan
 
LISA2019 Linux Systems Performance
LISA2019 Linux Systems PerformanceLISA2019 Linux Systems Performance
LISA2019 Linux Systems PerformanceBrendan Gregg
 
(JP) GPGPUがPostgreSQLを加速する
(JP) GPGPUがPostgreSQLを加速する(JP) GPGPUがPostgreSQLを加速する
(JP) GPGPUがPostgreSQLを加速するKohei KaiGai
 
平成生まれのための MINIX 講座
平成生まれのための MINIX 講座平成生まれのための MINIX 講座
平成生まれのための MINIX 講座TAKANO Mitsuhiro
 
CSI-503 - 11.Distributed Operating System
CSI-503 - 11.Distributed Operating SystemCSI-503 - 11.Distributed Operating System
CSI-503 - 11.Distributed Operating Systemghayour abbas
 
IPC in Microkernel Systems, Capabilities
IPC in Microkernel Systems, CapabilitiesIPC in Microkernel Systems, Capabilities
IPC in Microkernel Systems, CapabilitiesMartin Děcký
 
[Tobias herbig, franz_gerl]_self-learning_speaker_(book_zz.org)
[Tobias herbig, franz_gerl]_self-learning_speaker_(book_zz.org)[Tobias herbig, franz_gerl]_self-learning_speaker_(book_zz.org)
[Tobias herbig, franz_gerl]_self-learning_speaker_(book_zz.org)Ben Abdallah Amina
 
【学習メモ#4th】12ステップで作る組込みOS自作入門
【学習メモ#4th】12ステップで作る組込みOS自作入門【学習メモ#4th】12ステップで作る組込みOS自作入門
【学習メモ#4th】12ステップで作る組込みOS自作入門sandai
 
security and privacy in dbms and in sql database
security and privacy in dbms and in sql databasesecurity and privacy in dbms and in sql database
security and privacy in dbms and in sql databasegourav kottawar
 

Was ist angesagt? (20)

Distributed Database Management System
Distributed Database Management SystemDistributed Database Management System
Distributed Database Management System
 
Meltdown & Spectre
Meltdown & Spectre Meltdown & Spectre
Meltdown & Spectre
 
Disk management server
Disk management serverDisk management server
Disk management server
 
Pass Your Private Pilot Checkride PDF - Jason Schappert Your FAA Checkride Ex...
Pass Your Private Pilot Checkride PDF - Jason Schappert Your FAA Checkride Ex...Pass Your Private Pilot Checkride PDF - Jason Schappert Your FAA Checkride Ex...
Pass Your Private Pilot Checkride PDF - Jason Schappert Your FAA Checkride Ex...
 
LUNA-88K, emulators, and me
LUNA-88K, emulators, and meLUNA-88K, emulators, and me
LUNA-88K, emulators, and me
 
Bottom half in linux kernel
Bottom half in linux kernelBottom half in linux kernel
Bottom half in linux kernel
 
Analyse d'un kernel (crash, core) dump
Analyse d'un kernel (crash, core) dumpAnalyse d'un kernel (crash, core) dump
Analyse d'un kernel (crash, core) dump
 
Persistent Memory Development Kit (PMDK) Essentials: Part 1
Persistent Memory Development Kit (PMDK) Essentials: Part 1Persistent Memory Development Kit (PMDK) Essentials: Part 1
Persistent Memory Development Kit (PMDK) Essentials: Part 1
 
仮想マシンにおけるメモリ管理
仮想マシンにおけるメモリ管理仮想マシンにおけるメモリ管理
仮想マシンにおけるメモリ管理
 
Memory Hierarchy (RAM and ROM)
Memory Hierarchy (RAM and ROM)Memory Hierarchy (RAM and ROM)
Memory Hierarchy (RAM and ROM)
 
Memory management
Memory managementMemory management
Memory management
 
LISA2019 Linux Systems Performance
LISA2019 Linux Systems PerformanceLISA2019 Linux Systems Performance
LISA2019 Linux Systems Performance
 
(JP) GPGPUがPostgreSQLを加速する
(JP) GPGPUがPostgreSQLを加速する(JP) GPGPUがPostgreSQLを加速する
(JP) GPGPUがPostgreSQLを加速する
 
平成生まれのための MINIX 講座
平成生まれのための MINIX 講座平成生まれのための MINIX 講座
平成生まれのための MINIX 講座
 
CSI-503 - 11.Distributed Operating System
CSI-503 - 11.Distributed Operating SystemCSI-503 - 11.Distributed Operating System
CSI-503 - 11.Distributed Operating System
 
IPC in Microkernel Systems, Capabilities
IPC in Microkernel Systems, CapabilitiesIPC in Microkernel Systems, Capabilities
IPC in Microkernel Systems, Capabilities
 
[Tobias herbig, franz_gerl]_self-learning_speaker_(book_zz.org)
[Tobias herbig, franz_gerl]_self-learning_speaker_(book_zz.org)[Tobias herbig, franz_gerl]_self-learning_speaker_(book_zz.org)
[Tobias herbig, franz_gerl]_self-learning_speaker_(book_zz.org)
 
【学習メモ#4th】12ステップで作る組込みOS自作入門
【学習メモ#4th】12ステップで作る組込みOS自作入門【学習メモ#4th】12ステップで作る組込みOS自作入門
【学習メモ#4th】12ステップで作る組込みOS自作入門
 
07 Input Output
07  Input  Output07  Input  Output
07 Input Output
 
security and privacy in dbms and in sql database
security and privacy in dbms and in sql databasesecurity and privacy in dbms and in sql database
security and privacy in dbms and in sql database
 

Ähnlich wie Uncoupled Regression from Pairwise Comparison Data

Image Processing 2
Image Processing 2Image Processing 2
Image Processing 2jainatin
 
Low Complexity Regularization of Inverse Problems - Course #2 Recovery Guaran...
Low Complexity Regularization of Inverse Problems - Course #2 Recovery Guaran...Low Complexity Regularization of Inverse Problems - Course #2 Recovery Guaran...
Low Complexity Regularization of Inverse Problems - Course #2 Recovery Guaran...Gabriel Peyré
 
Additive model and boosting tree
Additive model and boosting treeAdditive model and boosting tree
Additive model and boosting treeDong Guo
 
Machine learning of structured outputs
Machine learning of structured outputsMachine learning of structured outputs
Machine learning of structured outputszukun
 
Low Complexity Regularization of Inverse Problems - Course #3 Proximal Splitt...
Low Complexity Regularization of Inverse Problems - Course #3 Proximal Splitt...Low Complexity Regularization of Inverse Problems - Course #3 Proximal Splitt...
Low Complexity Regularization of Inverse Problems - Course #3 Proximal Splitt...Gabriel Peyré
 
Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Valentin De Bortoli
 
Limits and Continuity of Functions
Limits and Continuity of Functions Limits and Continuity of Functions
Limits and Continuity of Functions OlooPundit
 
Derivative free optimization
Derivative free optimizationDerivative free optimization
Derivative free optimizationhelalmohammad2
 
Image Processing 3
Image Processing 3Image Processing 3
Image Processing 3jainatin
 
IVR - Chapter 5 - Bayesian methods
IVR - Chapter 5 - Bayesian methodsIVR - Chapter 5 - Bayesian methods
IVR - Chapter 5 - Bayesian methodsCharles Deledalle
 
Open GL T0074 56 sm4
Open GL T0074 56 sm4Open GL T0074 56 sm4
Open GL T0074 56 sm4Roziq Bahtiar
 
Multilinear singular integrals with entangled structure
Multilinear singular integrals with entangled structureMultilinear singular integrals with entangled structure
Multilinear singular integrals with entangled structureVjekoslavKovac1
 
Options Portfolio Selection
Options Portfolio SelectionOptions Portfolio Selection
Options Portfolio Selectionguasoni
 
Hyperfunction method for numerical integration and Fredholm integral equation...
Hyperfunction method for numerical integration and Fredholm integral equation...Hyperfunction method for numerical integration and Fredholm integral equation...
Hyperfunction method for numerical integration and Fredholm integral equation...HidenoriOgata
 
CS571: Gradient Descent
CS571: Gradient DescentCS571: Gradient Descent
CS571: Gradient DescentJinho Choi
 

Ähnlich wie Uncoupled Regression from Pairwise Comparison Data (20)

Image Processing 2
Image Processing 2Image Processing 2
Image Processing 2
 
MNAR
MNARMNAR
MNAR
 
Low Complexity Regularization of Inverse Problems - Course #2 Recovery Guaran...
Low Complexity Regularization of Inverse Problems - Course #2 Recovery Guaran...Low Complexity Regularization of Inverse Problems - Course #2 Recovery Guaran...
Low Complexity Regularization of Inverse Problems - Course #2 Recovery Guaran...
 
Additive model and boosting tree
Additive model and boosting treeAdditive model and boosting tree
Additive model and boosting tree
 
Machine learning of structured outputs
Machine learning of structured outputsMachine learning of structured outputs
Machine learning of structured outputs
 
Low Complexity Regularization of Inverse Problems - Course #3 Proximal Splitt...
Low Complexity Regularization of Inverse Problems - Course #3 Proximal Splitt...Low Complexity Regularization of Inverse Problems - Course #3 Proximal Splitt...
Low Complexity Regularization of Inverse Problems - Course #3 Proximal Splitt...
 
Fi review5
Fi review5Fi review5
Fi review5
 
Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...
 
sada_pres
sada_pressada_pres
sada_pres
 
Image denoising
Image denoisingImage denoising
Image denoising
 
Limits and Continuity of Functions
Limits and Continuity of Functions Limits and Continuity of Functions
Limits and Continuity of Functions
 
Derivative free optimization
Derivative free optimizationDerivative free optimization
Derivative free optimization
 
Image Processing 3
Image Processing 3Image Processing 3
Image Processing 3
 
IVR - Chapter 5 - Bayesian methods
IVR - Chapter 5 - Bayesian methodsIVR - Chapter 5 - Bayesian methods
IVR - Chapter 5 - Bayesian methods
 
talk MCMC & SMC 2004
talk MCMC & SMC 2004talk MCMC & SMC 2004
talk MCMC & SMC 2004
 
Open GL T0074 56 sm4
Open GL T0074 56 sm4Open GL T0074 56 sm4
Open GL T0074 56 sm4
 
Multilinear singular integrals with entangled structure
Multilinear singular integrals with entangled structureMultilinear singular integrals with entangled structure
Multilinear singular integrals with entangled structure
 
Options Portfolio Selection
Options Portfolio SelectionOptions Portfolio Selection
Options Portfolio Selection
 
Hyperfunction method for numerical integration and Fredholm integral equation...
Hyperfunction method for numerical integration and Fredholm integral equation...Hyperfunction method for numerical integration and Fredholm integral equation...
Hyperfunction method for numerical integration and Fredholm integral equation...
 
CS571: Gradient Descent
CS571: Gradient DescentCS571: Gradient Descent
CS571: Gradient Descent
 

Kürzlich hochgeladen

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 

Kürzlich hochgeladen (20)

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 

Uncoupled Regression from Pairwise Comparison Data

  • 1. Uncoupled Regression from Comparison Data Liyuan Xu Gatsby Unit@UCL, Former AIP member (Twitter: @ly9988)
  • 2. Disclaimer This talk is mainly based on our paper in NeurIPS2019
  • 4. Regression Problem (x1, y1), (x2, y2), … (Coupled) Data ∼ PXY f(X) ≃ 𝔼[Y|X] Learn Correspondence in data is assumed
  • 5. Uncoupled Regression Problem Uncoupled Data ∼ PX x1, x2, x3, … ∼ PY y1, y2, y3, … f(X) ≃ 𝔼[Y|X] Learn Regression without data correspondence
  • 6. Uncoupled Regression Uncoupled regression is impossible itself. →What is a practically feasible assumption?
  • 7. Application of Uncoupled Regression • Merging two datasets [Carpentier+, 2016] • : income, housing priceX Y : Government Publish X Bank Publish Y How to merge two datasets collected independently?
  • 8. Application of Uncoupled Regression • Privacy Preserving Machine Learning [Xu et al. 2019] • Consider contains sensitive informationY (Xi, Yi) Security Incident
  • 9. Application of Uncoupled Regression • Privacy Preserving Machine Learning [Xu et al. 2019] • Consider contains sensitive informationY Xi Yi Anonymized Data
  • 10. Data Fusion / Matching Uncoupled Data w. Context ∼ PXZ (x1, z1), (x2, z2), … ∼ PYZ (y1, z′1), (y2, z′2), … f(X) ≃ 𝔼[Y|X] Learn Use contextual data to merge two distributions → Data Fusion / Matching Z
  • 11. Isometric Uncoupled Regression [Carpentier+, 2016] Uncoupled Data ∼ PX x1, x2, x3, … ∼ PY y1, y2, y3, … f(X) ≃ 𝔼[Y|X] Learn Assuming 𝔼[Y|X] : monotonic Monotonicity makes uncoupled regression feasible
  • 12. Isometric Uncoupled Regression [Carpentier+, 2016] • Advantage • Consistency is proved [Rigollet et al. 2018] → Optimal model can be learn as data increases • Limitation • Monotonicity assumption may be too strong • Is really income monotonic to housing price ? • Only applicable to the case • Need to know the noise distribution • Solve problem with with known X Y X ∈ ℝ Y = f*(x) + ε P(ε)
  • 13. High-level concept Message in [Carpentier+, 2016] Uncoupled Data + Order Info. → Regression Order info is provided by monotonic assumption Our Idea Order info is learned from pairwise comparison data Uncoupled Data + Order Info. → Regression
  • 14. Problem Setting • Pairwise Comparison Data • Originally considered in ranking context • Sample two data points • Obtain Pairwise Comparison Data as (X, Y), (X′, Y′) ∼ PX,Y (X+ , X− ) { X+ = X, X− = X′ (if Y > Y′) X+ = X′, X− = X (if Y ≤ Y′)
  • 15. Uncoupled Regression from Pairwise Comparison ∼ PX x1, x2, x3, … ∼ PY y1, y2, y3, … f(X) ≃ 𝔼[Y|X] Learn ∼ PX+,X− (x+ 1 , x− 1 ), (x+ 2 , x− 2 ), … Uncoupled Data Pairwise Comparison Data
  • 16. Uncoupled Regression from Pairwise Comparison Proposes two approaches: Risk Approximation & Target Transformation • Advantage • Put no assumption on • Need not to know noise distribution • Limitation • Not consistent • Deviation from optimal model is bounded • Empirically it works 𝔼[Y|X]
  • 18. Formal Problem Settings • Data Given: • Unlabeled Data: • Target Set: • Pairwise Comparison Data: • Goal: Find that satisfies DX = {x1, x2, …, xn} ∼ PX DY = {y1, y2, …, yn} ∼ PY DX+,X− = {(x+ 1 , x− 1 ), …, (x+ m, x− m)} ∼ PX+,X− f* f* = arg min f R(f ), R(f ) = 𝔼[(f(X) − Y)2 ]
  • 19. Risk Approximation Loss Decomposition R(f ) = 𝔼X,Y[(f(X) − Y)2 ] = 𝔼X[f2 (X)] − 2𝔼X,Y[Yf(X)] + const . Estimated from unlabeled data DX Approx. by linear combination of and 𝔼X,Y[Yf(X)] 𝔼X+[f(X+ )] 𝔼X−[f(X− )]
  • 20. Risk Approximation Lemma 1 [Xu et al. 2019] For any function ,f 𝔼X+[f(X+ )] = 2𝔼X,Y[FY(Y)f(X)] 𝔼X−[f(X− )] = 2𝔼X,Y[(1 − FY(Y))f(X)], where is CDF ofFY Y If we can learn such thatw1, w2 Y ≃ 2w1FY(Y) + 2w2(1 − FY(Y)) then, 𝔼XY[Yf(X)] ≃ w1 𝔼X+[f(X+ )] + w2 𝔼X−[f(X− )]
  • 21. Risk Approximation • Risk Approximation • Step1: Estimate CDF • Step2: Learn weights for loss • Step3: Learn model ̂FY ̂w1, ̂w2 ̂f
  • 22. Risk Approximation • Risk Approximation • Step1: Estimate CDF • Step2: Learn weights for loss • Step3: Learn model ̂FY ̂w1, ̂w2 ̂f CDF is estimated viaFY
  • 23. Risk Approximation • Risk Approximation • Step1: Estimate CDF • Step2: Learn weights for loss • Step3: Learn model ̂FY ̂w1, ̂w2 ̂f Weight is learned bŷw1, ̂w2 ̂w1, ̂w2 = arg min |DY| ∑ i=1 (yi − 2w1 ̂FY(yi) − 2w2(1 − ̂FY(yi))) 2 Recall, we want Y ≃ 2w1FY(Y) + 2w2(1 − FY(Y))
  • 24. Risk Approximation • Risk Approximation • Step1: Estimate CDF • Step2: Learn weights for loss • Step3: Learn model ̂FY ̂w1, ̂w2 ̂f Model is learned byf ̂f = arg min f 1 |DX | |DX| ∑ i=1 f(xi)2 − 2 |DX+,X− | |DX+,X−| ∑ j=1 ̂w1f(x+ j ) + ̂w2 f(x− j ) 𝔼X[f2 (X)] 2𝔼XY[Yf(X)]
  • 25. Theoretical Property Theorem 2 [Xu et al. 2019] For learned , with some assumption,̂f R( ̂f ) ≤ R(f*) + Op ( 1 |DX |1/2 + 1 |DX−,X+ |1/2 ) + M Err( ̂w1, ̂w2) Here, is the approximation errorErr(w1, w2) Err(w1, w2) = 𝔼Y[(Y − 2w1FY(Y) − 2w2(1 − FY(Y)))2 ] → Approximate loss well, small bias in the model
  • 26. Theoretical Property Theorem 2 [Xu et al. 2019] For learned , with some assumption,̂f Especially, if thenY ∼ Unif[a, b] Err(b/2,a/2) = 0 R( ̂f ) ≤ R(f*) + Op ( 1 |DX |1/2 + 1 |DX−,X+ |1/2 ) + M Err( ̂w1, ̂w2)
  • 27. Theoretical Property Theorem 2 [Xu et al. 2019] For learned , with some assumption,̂f In general, ① Theoretically, it’s inevitable… ② Empirically it works! Err > 0 R( ̂f ) ≤ R(f*) + Op ( 1 |DX |1/2 + 1 |DX−,X+ |1/2 ) + M Err( ̂w1, ̂w2)
  • 28. Theoretical Property There exists two distributions that cannot distinguished by PX, PY, PX+,X−
  • 29. Theoretical Property PXY X Y ˜PXY X Y 1/6 1/8 5/24 1/4 1/8 5/24 1/6 1/6 1/6 1/6 1/6 1/12 Same , , butPX PY, PX+,X− 𝔼P[Y|X] ≠ 𝔼 ˜P[Y|X]
  • 30. Empirical Result • Learn a linear model in UCI datasets • Uncoupled regression • Use all features for , all targets for • Note, no correspondence is given • Generate 5000 pairs of • Supervised regression • Use entire coupled data DX DY DX+,X− (X, Y)
  • 31. Empirical Result • MSE of linear models in UCI datasets → Can yield almost same MSE as supervised learning !
  • 32. Conclusion So Far • Uncoupled Regression From Pairwise Comparison • Solve regression problem given • Unlabeled data • Set of target value • Pairwise comparison data • Introduced approach based on risk approximation • Theoretical and empirical results are given DX DY DX+,X−
  • 33. Modeling CDF from Pairwise Comparison Data
  • 34. Theoretical Property (Recap) Theorem 2 [Xu et al. 2019] For learned , with some assumption,̂f Especially, if then → We can learn optimal Y ∼ Unif[a, b] Err(b/2,a/2) = 0 Y R( ̂f ) ≤ R(f*) + Op ( 1 |DX |1/2 + 1 |DX−,X+ |1/2 ) + M Err( ̂w1, ̂w2)
  • 35. Predicting Percentile • Optimize Direct Marketing • : Customer Feature, : Probability of Purchase • Send discount tickets to 1% of potential customers • CDF is more the target of interest than • Predicting might not be a best idea… • Due to class imbalance, all can be very small X Y FY(Y) Y Y Y
  • 36. Predicting Percentile • Sometimes percentile is the target of interest • Learn that minimizes • follows →We can learn optimal from pairwise comparison f(X) R(f ) = 𝔼[(FY(Y) − f(X))2 ] FY(Y) Unif[0,1] f
  • 37. Motivating Example for Predicting Percentile • Online Chess Rating • : User attributes, : Abstract measure of “Skill” • Skill is compared by game • Pairwise comparison data given in nature • Want to know the percentile in skill ranking X Y
  • 38. Simple Solution • Problem (Recap) • Given pairwise comparison data • Predict conditional expectation of CDF • Simple Solution • Learn ranking model from • Transform to (X+ , X− ) 𝔼[FY(Y)|X] r(X) (X+ , X− ) r(X) 𝔼[FY(Y)|X]
  • 39. Pairwise-Ranking based Approach • Pairwise Learn to Rank • Learn ranker which minimizes rank loss • e.g. SVMRank, RankBoost • Given test data and rank model, r(X) Xtest 𝔼[FY(Y)|X] ≃ Rank of Xtest in entire data Number of entire data
  • 40. Weakness in Pairwise-Ranking based Approach • Original Goal is to minimize , • Rank model minimizes Small does not necessary mean small →We aim for directly minimizing R(f ) = 𝔼X,Y[(f(X) − FY(Y))2 ] r(X) Rr(r) R(f ) R(f )
  • 41. Direct Minimization Lemma 1 [Xu et al. 2019] For any function ,h 𝔼X+[h(X+ )] = 2𝔼X,Y[FY(Y)h(X)] 𝔼X−[h(X− )] = 2𝔼X,Y[(1 − FY(Y))h(X)] From this lemma, we have R(f ) = 𝔼X,Y[(f(X) − FY(Y))2 ] = 𝔼X[f2 (X)] −2𝔼X,Y[FY(Y)f(X)] +const . = 𝔼X[f2 (X)] −𝔼X+[f(X+ )] +const .
  • 42. R(f ) ≤ ̂R(f ) + Op 1 |DX | + 1 |DX+,X− | Empirical Approximation • The original loss (without constant) • The empirical loss R(f ) R(f ) = 𝔼X[f2 (X)] − 𝔼X+[f(X+ )] ̂R(f ) ̂R(f ) = 1 |DX | ∑ DX f2 (xi) − 1 |DX+,X− | ∑ DX+,X− f(x+ i )
  • 43. Summary • Summary • We can learn only from • Empirical loss to minimize is Can we use this to original regression problem? 𝔼[FY(Y)|X] DX, DX+,X− ̂R(f ) = 1 |DX | ∑ DX f2 (xi) − 1 |DX+,X− | ∑ DX+,X− f(x+ i )
  • 45. Target Transformation • From previous discussion, • We can learn optimal model for • We can learn CDF function . • Target Transformation Approach [Xu et al. 2019] 1. Learn function minimizes 2. Output regression model as FY(Y) FY ̂F RF(F) = 𝔼X,Y[(FY(Y) − F(X))2 ] ̂f ̂f = F(−1) Y (F(X))
  • 46. Target Transformation • Target Transformation • Step1: Estimate CDF • Step2: Learn CDF model • Step3: Learn regression model ̂FY ̂F ̂f
  • 47. Target Transformation • Target Transformation • Step1: Estimate CDF • Step2: Learn CDF model • Step3: Learn regression model ̂FY ̂F ̂f CDF is estimated viaFY
  • 48. Target Transformation • Target Transformation • Step1: Estimate CDF • Step2: Learn CDF model • Step3: Learn regression model ̂FY ̂F ̂f Model is learned bŷF ̂F = arg min F 1 |DX | |DX| ∑ i=1 F(xi)2 − 1 |DX+,X− | |DX+,X−| ∑ j=1 F(x+ j ) 𝔼X[f2 (X)] 2𝔼XY[FY(Y)f(X)]
  • 49. Target Transformation • Target Transformation • Step1: Estimate CDF • Step2: Learn CDF model • Step3: Learn regression model ̂FY ̂F ̂f Model is learned byf ̂f = F−1 Y ( ̂F(X))
  • 50. Experiment on UCI • RA: Risk Approximation • TT: Target Transformation • SVMRank: TT approach with is learned based on SVMRank̂F
  • 51. Conclusion • Uncoupled Regression From Pairwise Comparison • Solve regression problem given • Unlabeled data • Set of target value • Pairwise comparison data • Approach based on risk approximation • Theoretical and empirical results are given • Approach based on target transformation • (Theoretical) and empirical results are given DX DY DX+,X−
  • 52. Thank you! • Follow me on Twitter! (@ly9988)