# Interpreting Multiple Regression via an Ellipse Inscribed in a Square Extensible to any Finite Dimensionality

2019-09-14 下野寿之 明治大学MIMS共同研究集会 Data-driven Mathematical sciences 経済物理学とその周辺

### Interpreting Multiple Regression via an Ellipse Inscribed in a Square Extensible to any Finite Dimensionality

1. 1. Interpreting Multiple Regression via an Ellipse Inscribed in a Square Extensible to any Finite Dimensionality 2019-09-14 下野寿之 明治⼤学MIMS共同研究集会 経済物理学とその周辺 Data-driven Mathematical sciences 1
2. 2. Main Claims uThe multiple regression can be interpreted using a Euclidean ellipse/ellipsoid/hyper-ellipsoid. 1. Multiple corr. coeff. : Lengths ratio of line segments. 2. Regression coeff. : Read by a linear scholar field. 3. Partial corr. coeff : Read by a measure inside an ellipse. uThe above results : p make it easy to understand the multiple regression both in (1) numerical results and (2) how to calculate. p may help in solving multicollinearity and other issues. 2
3. 3. Variables and Geometric Shapes 3 P : the Point inside E. q : a quadratic form. ri; rij : corr. coeff. btw. Xi and Y; Xi and Xj. S : the unit square/(hyper) cube. Ti : the tangent points for E and S. Ui : the distance of P from the i-th axis. Vi : the veer of ruler parallel to i-th axis. Wi : the radius of E along i-th axis. X = (X1,.., Xd) : the explanatory variables. Y : the response variable. z : an arbitrary point inside E or S. ai : regression coefficients b : the intercept term of regression. c : a constant scalar. d : the number of explanatory variables X. E : the Ellipse inscribing inside S. Fi : a linear scalar field (to read ai) gi : an affine function (to read partial corr.) i : the indicator e.g. X1,X2,.. or Xd. j : the second indicator. M : correlation matrix upon (X1,X2,.., Xd). O : the origin point (0,0,..,0).
4. 4. Variables and Geometric Shapes 4 P : the Point inside E. ri : corr. coeff. btw. Xi and Y rij : corr. coeff. btw. Xi and Xj. X = (X1,.., Xd) : the explanatory variables. Y : the response variable. ai : regression coefficients d : the number of explanatory variables X. E : the Ellipse inscribing inside S. M : correlation matrix upon (X1,X2,.., Xd).
5. 5. I. Background 4 slides About Multiple Regressions 5
6. 6. Linear Combination Modeling is Widely Used. Y = a1 X1 + … + ad Xd + b + error. Ø Multiple Regression. Ø Many of basic statistical/math/physics models. Ø Pieces of machine learning e.g. Deep Learning. 6
7. 7. The Multiple Regression : → Regression coeff. ai → Multiple corr. coeff. ∈ [0,1] → Partial corr. coeff. ∈ [-1,1] ˆY = a1 X1 + a2 X2 +..+ ad Xd + b 7
8. 8. Output of Multiple Regression 8 The formulas above are taken from [Kei Takeuchi, Haruo Yanai Tahenryou Kaiseki No Kiso. Tokyo Keizai Inc. (1972) ].
9. 9. Results by Multiple Regression is, However, Difficult to Interpret : 1. Multiple correlation coefficient : Ø How it has unexpectedly large value sometimes? 2. Regression coefficients for Xi : Ø Why is it different in ± signs from intuition sometimes? Ø Why is it very large sometimes? 3. Partial correlation coefficient for Xi : Ø Why is it different in ± signs from the corr. coeff. btw. Xi and Y? n Other issues: Ø Multicollinearity, especially for time series analysis. Ø Instability occurs w.r.t. sample from same population. Ø Incomputability by the correlation matrix having one or more negative eigenvalues during handling missing values. 9 L
10. 10. 数量の関係をどう理解したら 良いだろうか?? • 実は、難しい数式を経由しなくても、重相関係数などは 作図で求めることができる。 • 重回帰の幾何的な表現により、把握が容易になる。 (この後で述べる⽅法を広く普及させたい!) • 重回帰に関係するいろいろな現象の理解を俯瞰的に与え ることが出来る。 • 既にある多変量に関係する理論を分かりやすく再構築す る可能性がある。 • 新たな理論を導く可能性もある。 10
11. 11. II. New 3 Theorems 7 slides How to Interpret the Results of Multiple Regression Geometrically 11 An ellipse/ellipsoid in Euclidean space
12. 12. ri is corr. coef. btw. Xi and Y. rij is corr. coef. btw. Xi and Xj. (Note: rij =1 if i=j.) Draw S, Ti , E and P from correlations When d = 2 : E(ellipse) S(square) When d = 3 : (1,1) (1,−1)(−1,−1) (−1,1) 12 T1 -T1 -T2 T2 S is the unit square. Ti = (ri1,ri2,..,rid) for i=1,..,d. E inscribes S at±Ti (i=1..d) P = (r1,r2,..,rd). T1 T2 T3 P M := (T1|T2|..|Td) = (rij), then E = {z| zT M-1 z = 1} holds. (1) Correlations : (2) How to make Ellipse and Point : (3) The Matrix M (corr. matrix on X) :
13. 13. O P E S (square) (1,1) (1,−1)(−1,−1) (−1,1) S : the square surrounded by x1=±1, x2=±1. E : the ellipse inscribing S at (x1,x2)=±(ri1,ri2) for i=1,2 rij is corr. coeff. btw. Xi and Xj. P: the point (x1,x2) = (r1,r2) ri is corr. coeff. btw. Xi and Y. Note that : Extensible to dim = 3, 4, 5,.. E can be given by : { x | xT R-1 x = 1 } , R is corr. coeff. matrix X1, X2 ,.. , Xd . 13 Preparing S, E and P
14. 14. Square S, Ellipse E, Point P 1. Define d (the number of explanatory variables) and set up an d-dim Euclid space (axes of: x1,.., xd). 2. Draw S : surrounded by x1 =±1, x2 =±1 .. xd =±1. 3. Ellipse E inscribing S centering the origin O=(0,..,0) : inscribed with the points T1, T2,..,Td taken from the d x d correlation matrix over X1,.., Xd as split into (T1|T2|..|Td). 4. Point P inside E : whose i-th coordinate is specified by the correlation coefficient between Xi and Y. 14 Note: All above are determined only by the corr. coeff. of X1,..,Xd and Y.
15. 15. Multiple Corr. Coeff. = |OP|/|OP’| P P’ O PP’ O P P’ O P P’ O 15 Theorem 1 Case of r12 = 0 : R=(r1 2+r2 2)1/2 Case of R = r1 : iff r1 r12=r2. q(z):=(zTM-1z) . (A quadratic form!) Then q(cr)=c2q(r) for a scalar c i.e. q(cr) 1/2 =|c| q(r) 1/2. Recall E = { z | q(z)=1 } : q(P’)=1 thus q(P)1/2 =OP/OP’. (PT M-1 P)1/2 Let P’ be OP // OP’ and P’ ∈ E r1=0.4, r2=0, but R = 0.8 Multiple corr. coeff. can be considered geometrically!
16. 16. 16 Supplementary:
17. 17. 重相関係数は楕円の作図で求まる 説明変数間(総得点と総失点)の相関係数ρに応じて、x=±1,y=±1に囲まれた正⽅形に4点(±ρ, ±1), (±1, ±ρ) で内接する楕円を描く。そして、説明変数たちに対する⽬的変数(年間順位)への相関係数の組(ρ1,ρ2)に対 応する点に打点する。図において2個の楕円の相似⽐が、重相関係数に等しい。(原点から補助線を図のよ うに引くか、同⼼・同⽅向・相似な楕円を打点を通るように描く。) 決定係数は楕円の⾯積⽐となる。なお、⾼次元への拡張は容易。さらにある⼯夫をすることで偏相関係数を求めることも 可能。 17
18. 18. Regression Coefficient : ai For i=1,..,d : Consider a linear map Fi : such that it gives 0 at T1,..,Td except Ti it gives 1 at Ti ,it gives -1 at -Ti. If X1,X2,..,Xd and Y is standardized, ai = Fi (P). If not, ai = Fi (P) * sd [Y] / sd [Xi] ( sd is standard deviation). T1 T2 -1 18 Theorem 2 P If all X1,X2,..,Xd and Y are standardized. M = (T1|T2|..|Td) can be regarded as a new “coordinate system” with the axes T1,T2,..,Td ; M-1 P is the new coordinates of P. M-1 P The color corresponds to the linear scalar field F1 (blue → -1 yellow → 0, red → 1). The color at P gives a1(P).
19. 19. Regression Coeff. : fi(P) * sd(Y)/sd(Xi) Let a linear map fi : Rd→R fi ( Tj )= 1 if i = j . fi ( Tj )= 0 if i ≠ j . Recall : Tj is the j-th column of is corr. matrix over X. T1 T2 -T2 -T1 RX ×X RX ×X 19 Theorem (2) !d → ! P
20. 20. 20 Supplementary:
21. 21. Let a rod Qi- Qi+ be the longest one inside the ellipse E, passing through P, parallel to the i-th axis with the same direction. Let an affine function gi satisfy gi (-1) = Qi - andgi (+1) = Qi + Then the partial corr. coeff. btw. Xi and Y equals gi -1(P). Q1 - Q1 + P 21 Theorem 3 By watching at P, the partial correlation coefficient is read by the red measure inscribing Ellipse. ri - Vi Wi√1-Ui 2 W1 U1 V1 r1
22. 22. zzPartial Correlation : gi -1(P) Let a rod Pi- Pi+ be the longest one inside the ellipse E, passing through P, parallel to xi-axis with the same direction. Let an affine func. gi:R→Rd satisfy gi(Pi ±)=±1. Pi - Pi + P 22 Theorem (3) ! → !d The partial correlation coefficient can be read by the red measure at P.
23. 23. 23 Supplementary:
24. 24. Geometric 3 Theorems : 1. Multiple corr. coeff. : It is |OP|/|OP’| by letting OP and E cross at P’. 2. [ Regression coeff. :] ai is fi(P) * sd(Y)/sd(Xi) ← sd: standard deviation by letting linear functions fi : Rd→R as fi ( Cj )=δij (δij: Kronecker delta) for i, j∈{1,2,..,d}. 3. Partial corr. coeff. : Let a line segment Pi- Pi+ be the longest one inside E and parallel to xi-axis with the same direction. Fixing variables X1, .., Xd except Xi , the partial corr. coeff. btw. Xi and Y is gi -1(P) by letting affine func. gi:R→Rd satisfy gi(Pi ±)=±1. 24
25. 25. III. More intuitive, more geometric proofs. 2 slides 25
26. 26. More Intuitive Proofs.. ØW.L.O.G., one can reduce the variables’ distribution into high-dimensional Gaussian distribution. Øe := { ζ | ζT M’-1 ζ = 1 } where m := ( ). 26 “e” O P E S (square) (1,1) (1,−1)(−1,−1) (−1,1) M P PT 1 E is d - dimensional ellipsoid. “e” is (d+1) - dimensional ellipsoid.
27. 27. 27 ◆ m is a (d+1)×(d+1)-matrix. ▶ m =: (t1|t2|..|td|td+1). ▶ ei:= (0,..,0,1,0,..,0) where only the i-th element is 1. ▶ o:= (0,…,0) in (d+1)-dim space. t1 Consider : ● Multiple correlation coefficient : The section of “e” by the 2-dim plane containing o, ed+1, td+1. ● Standardized regression coefficient : The inclination of hyper-plane containing o, t1, t2, .. , td. ● Partial correlation coefficient : The section of “e” by the 2-dim plane containing o, ei, ed+1. Then the same conclusion will be obtained as the 3 theorem! t2 td+1 e1 e2 ed+1 (d+1) - dimensional
28. 28. IV. Conclusive Summary 1 slide 28
29. 29. Usefulness of Theorems: ü The results of the multiple regression can be visualized in an easily understandable way when d = 2 or 3. ü The theorems may exploit new theories about linear combination modeling, which solve: Ø the interpretation of numerical computation results, Ø unstableness, Ø multicollinearity, Ø etc. u The extension to canonical analysis is expected as a next step. 29 J
30. 30. Extra Slides 30
31. 31. ( )( ) ( ) ( ) 1 2 2 1 1 [ , ] n i i i n n i i i i X X Y Y X Y X X Y Y r = = = - - = - - å å å 31
32. 32. 年間総得点と年間順位の関係 相関係数は -0.419.. 年間の得点が多いほど 順位は上がり優勝に近づく 32
33. 33. 年間総失点と年間順位の関係 相関係数は +0.471.. 年間の失点が少ないほど 順位は上がり優勝に近づく 33
34. 34. 総得点(x)と総失点(y)の関係 相関係数は +0.423.. (得点と失点は正に相関す る) 34
35. 35. 順位を総得点と総失点で重回 帰 重相関係数は 0.828.. ◎⽬的変数(順位)は2個の説明変数を⽤いることで 予測精度が上がった。 ◎ これらの数量の関係をどう理解したら良いだろうか?? 35