19. Mutual Information Common strategy : Find W which makes as independent as possible. Mutual Information is a good independence measure. are mutually independent. ⇔ : joint distribution of : marginal distribution of
20.
21. Estimation Method Estimate the density ratio : (Legendre-Fenchel convex duality [Nguyen et al. 08] ) Define , then we can write where sup is taken over all measurable functions . the optimal function is the density ratio
22.
23. Linear model for g Linear model is basis function, e.g., Gaussian kernel penalty term
25. Gaussian Kernel We use a Gaussian kernel for basis functions: where are center points randomly chosen from sample points: . Linear combinations of Gaussian kernels span a broad function class. Distribution Free
26.
27. Asymptotic Analysis Regularization parameter : Theorem : Complexity of the model ( large:complex, small:simple ) Theorem Nonarametric Parametric : matrices like Fisher Information matrix (bracketing entropy condition)
28.
29.
30. Supervised Dimension Reduction Input Output :“ good ” low dimensional representation -> Sufficient Dimension Reduction (SDR) A natural choice of W :
33. Result one-sided t-test with sig. level 1 %. Mean and standard deviation over 50 times trials Our method nicely performs.
34. UCI Data Set one-sided t-test with sig. level 1 %. Choose 200 samples and train SVM on the low dimensional representation. Classification error over 20 trials.
38. Sparse Learning : n samples : Convex loss ( hinge, square, logistic ) L 1 -regularization-> sparse Lasso Group Lasso I : subset of indices [Yuan&Lin:JRSS2006] [Tibshirani :JRSS1996]
40. Reproducing Kernel Hilbert Space (RKHS) : Hilbert space of real valued functions : map to the Hilbert space such that Reproducing kernel Representer theorem
41. Moore-Aronszajn Theorem : positive (semi-)definite, symmetric : RKHS with reproducing kernel k one to one
42.
43.
44. カーネル重みとの関係 [Micchelli & Pontil: JMLR2005] 目的関数をカーネル関数の凸結合の中で最小化 : given k は k m らの凸結合 Young の不等式