Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.
Nonnegative Matrix Factorization with Side Information for
Time Series Recovery and Prediction
Jiali Mei 4 Yohann De Castr...
1/35
Context
Utility companies are interested in electricty consumption data of small
regions (village, block, small city) on a...
Motivating example 1: data from meters
Figure: Electricity meter readings Figure: Daily electricity consumption
Traditiona...
Motivating example 2: data from electricity network
Figure: Map of the 7th Arrondissement of Lyon and low-voltage transfor...
Motivating example 3: electricity consumption and external factors
03:00 08:00 13:00 18:00 23:00
1000150020002500300035004...
A matrix representation
Figure: A matrix representation of the estimation target of the thesis
Variable of interest: vi,j ...
Main questions
Figure: A matrix representation of the estimation
target of the thesis
How can we estimate all entries of t...
Outline
Method: Nonnegative matrix factorization with side information
NMF with linear measurements
Time series recovery a...
Outline
Method: Nonnegative matrix factorization with side information
NMF with linear measurements
Time series recovery a...
Nonnegative matrix factorization
We propose to solve the estimation problem by nonnegative matrix factorization
(NMF, Lee ...
Nonnegative matrix factorization
We propose to solve the estimation problem by nonnegative matrix factorization
(NMF, Lee ...
Trace regression model
We wish to recover a matrix V∗, with the knowledge of data a ∈ RD, which are linear
measurements on...
Trace regression model
We wish to recover a matrix V∗, with the knowledge of data a ∈ RD, which are linear
measurements on...
Temporal aggregation patterns
In the electricity application, the masks correspond to the temporal aggregation
patterns:
1...
Classical NMF algorithms with linear measurements
We minimize the quadratic approximation error, with a linear equality co...
Classical NMF algorithms with linear measurements
We minimize the quadratic approximation error, with a linear equality co...
Classical NMF algorithms with linear measurements
We minimize the quadratic approximation error, with a linear equality co...
Outline
Method: Nonnegative matrix factorization with side information
NMF with linear measurements
Time series recovery a...
Regression models on factors
We introduce regression models in the NMF framework to take into account external
factors hav...
Generative low-rank model with exogenous variables
To take into account exogenous variables as side information, we propos...
Classification of models
The generative model leads to the following optimization problem:
min
V,fr∈F k
r ,fc∈F k
c
V − (fr...
Classification of models
The generative model leads to the following optimization problem:
min
V,fr∈F k
r ,fc∈F k
c
V − (fr...
Outline
Method: Nonnegative matrix factorization with side information
NMF with linear measurements
Time series recovery a...
Extending HALS
The generative model leads to the following optimization problem:
min
V,fr∈F k
r ,fc∈F k
c
V − (fr(Xr))+(fc...
HALSX: Pseudo-code
Data: A, a, Xr, Xc, Fr, Fc, 1 ≤ k ≤ min{n1, n2}.
Result: Vt, Ft
r ∈ Rn1×k
+ , ft
r,1, ..., ft
r,k ∈ Fr,...
Local convergence of HALSX
The following property known for HALS (Kim, He, and Park 2014):
Property
For all R ∈ Rn1×n2 , y...
Local convergence of HALSX
The following property known for HALS (Kim, He, and Park 2014):
Property
For all R ∈ Rn1×n2 , y...
Outline
Method: Nonnegative matrix factorization with side information
NMF with linear measurements
Time series recovery a...
Experimental setting
Three datasets are used in experiments:
Synthetic a rank-20 150-by-180 nonnegative matrix simulated f...
Recovery or prediction
To test the prediction on new individuals and new periods, temporal aggregates
are generated on a n...
Profiles obtained with Portuguese dataset
Using external factors, the obtained profiles present visible annual cycles.
25/35
Outline
Method: Nonnegative matrix factorization with side information
NMF with linear measurements
Time series recovery a...
Results on time series recovery
Synthetic Portuguese French
periodicrandom
0.1 0.2 0.3 0.4 0.50.1 0.2 0.3 0.4 0.50.1 0.2 0...
Outline
Method: Nonnegative matrix factorization with side information
NMF with linear measurements
Time series recovery a...
Results on time series prediction
Row error Column error RC error
periodicrandom
0.25 0.50 0.75 1.00 0.25 0.50 0.75 1.00 0...
Outline
Method: Nonnegative matrix factorization with side information
NMF with linear measurements
Time series recovery a...
Conclusions
In this talk we
formalized the temporal aggregate observations in electricity consumption as a
trace regressio...
Perspectives of the thesis
Industrial applications
Instead of estimating the whole time series, NMF can be used to directl...
References
Jacob Abernethy et al. “A New Approach to Collaborative Filtering: Operator Estimation with Spectral Regulariza...
How to calculate PA(X)
For some forms of masks, there are efficient methods.
Matrix completion: replacing the observed entri...
Which functional spaces to choose
HALSX is rather agnostic in the choice of regression models.
There is a biais-variance t...
Sie haben dieses Dokument abgeschlossen.
Lade die Datei herunter und lese sie offline.
Nächste SlideShare
What to Upload to SlideShare
Weiter
Nächste SlideShare
What to Upload to SlideShare
Weiter
Herunterladen, um offline zu lesen und im Vollbildmodus anzuzeigen.

Teilen

Nonnegative Matrix Factorization with Side Information for Time Series Recovery and Prediction by Jiali Mei, Researcher @Shift Technology

Herunterladen, um offline zu lesen

Abstract : Motivated by the recovery and prediction of electricity consumption time series, we extend Nonnegative Matrix Factorization to take into account external features as side information. We consider general linear measurement settings, and propose a framework which models non-linear relationships between external features and the response variable. We extend previous theoretical results to obtain a sufficient condition on the identifiability of NMF with side information. Based on the classical Hierarchical Alternating Least Squares (HALS) algorithm, we propose a new algorithm (HALSX, or Hierarchical Alternating Least Squares with eXogeneous variables) which estimates NMF in this setting. The algorithm is validated on both simulated and real electricity consumption datasets as well as a recommendation system dataset, to show its performance in matrix recovery and prediction for new rows and columns.

Ähnliche Bücher

Kostenlos mit einer 30-tägigen Testversion von Scribd

Alle anzeigen

Ähnliche Hörbücher

Kostenlos mit einer 30-tägigen Testversion von Scribd

Alle anzeigen

Nonnegative Matrix Factorization with Side Information for Time Series Recovery and Prediction by Jiali Mei, Researcher @Shift Technology

  1. 1. Nonnegative Matrix Factorization with Side Information for Time Series Recovery and Prediction Jiali Mei 4 Yohann De Castro 1 Yannig Goude 1,2 Jean-Marc Azaïs 3 Georges Hébrail 2 1LMO, Univ. Paris-Sud, CNRS, Universite Paris-Saclay, Orsay 2EDF Lab Paris-Saclay, Palaiseau 3Institut de Mathématiques, Université Paul Sabatier, Toulouse 4Shift Technology, Paris September 26, 2018 1/35
  2. 2. 1/35
  3. 3. Context Utility companies are interested in electricty consumption data of small regions (village, block, small city) on a fine temporal scale. This is useful in several ways: Useful for utility companies to manage the supply-demand balance locally, in a world with decentralized electricity generation (wind and solar power) and open electricity market; A requirement for the transmission system operators (TSO) by regulators; Generally useful information to better understand socio-economic activities on a fine temporal level. Are there enough data for doing this? 2/35
  4. 4. Motivating example 1: data from meters Figure: Electricity meter readings Figure: Daily electricity consumption Traditional electricity meters need to be read physically, therefore at a lower frequency than needed for further applications. The resulting data are asynchronous, since the meter reading dates are not aligned for all clients. Such data are difficult to further process. 3/35
  5. 5. Motivating example 2: data from electricity network Figure: Map of the 7th Arrondissement of Lyon and low-voltage transformers in this district Load data can be at a high temporal frequency on the electricity network, but at a different or coarser spatial scale than what is needed. 4/35
  6. 6. Motivating example 3: electricity consumption and external factors 03:00 08:00 13:00 18:00 23:00 1000150020002500300035004000 Influence of calendar variablesconsumption(kw) Monday Tuesday Wednesday Thursday Friday Saturday Sunday 0 10 20 30 0200040006000800010000 Influence of the temperature Temperature consumption(kw) Figure: Portuguese electricity consumption versus days of the week and the temperature. It is well established that electricity is influenced by many external factors. 5/35
  7. 7. A matrix representation Figure: A matrix representation of the estimation target of the thesis Variable of interest: vi,j is the electricity consumption at period i, for an individual j, with n1 periods and n2 individuals in total. V ∈ Rn1×n2 for the whole matrix, (vi)T and vj for i-th row and j-th column. 6/35
  8. 8. Main questions Figure: A matrix representation of the estimation target of the thesis How can we estimate all entries of the matrix V from temporal aggregates and/or spatial aggregates? Can the use of additional information such as temporal regularity and additional exogenous variables improve such estimations? Is it possible to produce predictions of electricity consumption for new periods and new individuals with such data? 7/35
  9. 9. Outline Method: Nonnegative matrix factorization with side information NMF with linear measurements Time series recovery and prediction with side information HALSX algorithm Experiments Time series recovery Time series prediction Conclusions 8/35
  10. 10. Outline Method: Nonnegative matrix factorization with side information NMF with linear measurements Time series recovery and prediction with side information HALSX algorithm Experiments Time series recovery Time series prediction Conclusions 9/35
  11. 11. Nonnegative matrix factorization We propose to solve the estimation problem by nonnegative matrix factorization (NMF, Lee and Seung 1999). Based on the hypothesis that the matrix to be recovered is of low-rank. All entries in the factor matrices are nonnegative. A dimension-reduction tool, similar to Singular Value Decomposition (SVD), Principal Component Analysis (PCA), etc.. 10/35
  12. 12. Nonnegative matrix factorization We propose to solve the estimation problem by nonnegative matrix factorization (NMF, Lee and Seung 1999). Based on the hypothesis that the matrix to be recovered is of low-rank. All entries in the factor matrices are nonnegative. A dimension-reduction tool, similar to Singular Value Decomposition (SVD), Principal Component Analysis (PCA), etc.. Remarks on non-negativity Why?: For the electricity application: nonnegative consumption profiles and weights are much more interpretable. Price to pay: Less convergence guarantee. 10/35
  13. 13. Trace regression model We wish to recover a matrix V∗, with the knowledge of data a ∈ RD, which are linear measurements on the unknown matrix V∗, or a = A(V∗ ), where A is a known linear operator. 11/35
  14. 14. Trace regression model We wish to recover a matrix V∗, with the knowledge of data a ∈ RD, which are linear measurements on the unknown matrix V∗, or a = A(V∗ ), where A is a known linear operator. The linear operator A is identified by A1, ..., AD, D matrices or masks of the same dimension as V∗. For all matrix X ∈ RT ×N , A(X) ≡ (Tr(AT 1 X), Tr(AT 2 X), ..., Tr(AT DX))T . Hence the name trace regression model (Rohde and Tsybakov 2011). Usual types of measurement operator A complete observations matrix completion (Candès and Recht 2009) matrix sensing (Recht, Fazel, and Parrilo 2010) rank-one matrix projections (Zuk and Wagner 2015) temporal aggregates 11/35
  15. 15. Temporal aggregation patterns In the electricity application, the masks correspond to the temporal aggregation patterns: 12/35
  16. 16. Classical NMF algorithms with linear measurements We minimize the quadratic approximation error, with a linear equality constraint: min V∈Rn1×n2 , Fr∈Rn1×k, Fc∈Rn2×k V − FrFT c 2 F s.t. V ≥ 0, Fr ≥ 0, Fc ≥ 0, A(V) = a. 13/35
  17. 17. Classical NMF algorithms with linear measurements We minimize the quadratic approximation error, with a linear equality constraint: min V∈Rn1×n2 , Fr∈Rn1×k, Fc∈Rn2×k V − FrFT c 2 F s.t. V ≥ 0, Fr ≥ 0, Fc ≥ 0, A(V) = a. We solve it by combining classical iterative NMF algorithms, such as HALS or NeNMF (Cichocki 2009; Guan et al. 2012), with a projection step: V = PA(FrFT c ), where PA is the projection operator into the convex set A defined by the two constraints, V ≥ 0, A(V) = a. 13/35
  18. 18. Classical NMF algorithms with linear measurements We minimize the quadratic approximation error, with a linear equality constraint: min V∈Rn1×n2 , Fr∈Rn1×k, Fc∈Rn2×k V − FrFT c 2 F s.t. V ≥ 0, Fr ≥ 0, Fc ≥ 0, A(V) = a. We solve it by combining classical iterative NMF algorithms, such as HALS or NeNMF (Cichocki 2009; Guan et al. 2012), with a projection step: V = PA(FrFT c ), where PA is the projection operator into the convex set A defined by the two constraints, V ≥ 0, A(V) = a. Data: PA, 1 ≤ k ≤ min{n1, n2} Result: V ∈ A, Fr ∈ Rn1×k + , Fc ∈ Rn2×k + Initialize F0 r, F0 c ≥ 0, V0 = PA(F0 r(F0 c)T ), i = 0; while Stopping criterion is not satisfied do Fi+1 r = Update(Fi r, (Fi c)T , Vi); (Fi+1 c )T = Update(Fi+1 r , (Fi c)T , Vi); Vi+1 = PA(Fi+1 r (Fi+1 c )T ); i = i + 1; end Limiting points are stationary points, as most NMF algorithms. 13/35
  19. 19. Outline Method: Nonnegative matrix factorization with side information NMF with linear measurements Time series recovery and prediction with side information HALSX algorithm Experiments Time series recovery Time series prediction Conclusions 14/35
  20. 20. Regression models on factors We introduce regression models in the NMF framework to take into account external factors having an influence in electricity consumption. Potential benefits It may improve recovery quality. It may help to interpret the estimated profiles. The regression models may be used in prediction for new periods and new individuals. 15/35
  21. 21. Generative low-rank model with exogenous variables To take into account exogenous variables as side information, we propose a generative low-rank nonnegative model: V∗ has an NMF: V∗ = FrFT c . The data are still a = A(V∗). Features matrices Xr ∈ Rn1×d1 and Xc ∈ Rn2×d2 are connected to V∗ through link functions fr : Rd1 → Rk and fc : Rd1 → Rk, so that Fr = (fr(Xr))+, Fc = (fc(Xc))+, where the matrices are obtained by stacking the row vectors together. Given this generative model, the task is to estimate fc, fr, Fc, Fr, and V∗, given Xr, Xc, A, and a. 16/35
  22. 22. Classification of models The generative model leads to the following optimization problem: min V,fr∈F k r ,fc∈F k c V − (fr(Xr))+(fc(Xc))T + 2 F s.t. A(V) = a, V ≥ 0. 17/35
  23. 23. Classification of models The generative model leads to the following optimization problem: min V,fr∈F k r ,fc∈F k c V − (fr(Xr))+(fc(Xc))T + 2 F s.t. A(V) = a, V ≥ 0. The generative model is very general and includes many known methods as special cases, by specifying: measurement operator A: complete observations matrix completion matrix sensing rank-one matrix projections temporal aggregates functional spaces of fr, fc: reduced-rank linear models (Foygel et al. 2012) non-parametric reduced-rank regression (Mukherjee and Zhu 2011) features Xr, Xc: multiple kernel learning (Gönen and Alpaydın 2011) collaborative filtering with graph features (Abernethy et al. 2009) 17/35
  24. 24. Outline Method: Nonnegative matrix factorization with side information NMF with linear measurements Time series recovery and prediction with side information HALSX algorithm Experiments Time series recovery Time series prediction Conclusions 18/35
  25. 25. Extending HALS The generative model leads to the following optimization problem: min V,fr∈F k r ,fc∈F k c V − (fr(Xr))+(fc(Xc))T + 2 F s.t. A(V) = a, V ≥ 0. To solve the problem above, we extend the HALS algorithm mentioned before, we modify the update function for each rank at each iteration to use the exogenous variables. We call this algorithm HALSX (HALS with eXogenous variables). With fairly mild conditions, HALSX also verifies that its limiting points are stationary points. Sufficient conditions of the uniqueness of such a decomposition can be found in the case where the link functions are linear. 19/35
  26. 26. HALSX: Pseudo-code Data: A, a, Xr, Xc, Fr, Fc, 1 ≤ k ≤ min{n1, n2}. Result: Vt, Ft r ∈ Rn1×k + , ft r,1, ..., ft r,k ∈ Fr, Ft c ∈ Rn2×k + , ft c,1, ..., ft c,k ∈ Fc. Initialize F0 r, F0 c ≥ 0, t = 0; while Stopping criterion is not satisfied do Vt = arg minV|A(V)=a,V≥0 V − Ft r(Ft c)T 2 F ; Rt = Vt − Ft r(Ft c)T ; for i = 1, 2, ..., k do Rt i = Rt + ft r,i(ft c,i)T ; Calculate ft+1 r,i = arg minf∈Fr Rt i − f(Xr)(ft c,i)T 2 F ; ft+1 r,i = max(0, ft+1 r,i (Xr)); Rt = Rt i − ft+1 r,i (ft c,i)T ; end for i = 1, 2, ..., k do Rt i = Rt + ft+1 r,i (ft c,i)T ; Calculate ft+1 c,i = arg minf∈Fc Rt i − ft+1 r,i f(Xc)T 2 F ; ft+1 c,i = max(0, ft+1 c,i (Xc)); Rt = Rt i − ft+1 r,i (ft+1 c,i )T ; end t = t + 1; end 20/35
  27. 27. Local convergence of HALSX The following property known for HALS (Kim, He, and Park 2014): Property For all R ∈ Rn1×n2 , y ∈ Rn2 + , y not identically zero, any vector x∗ that verifies x∗ ∈ arg min x∈Rn1 R − x(y)T 2 F , is also a solution to min x∈Rn1 R − x+(y)T 2 F . 21/35
  28. 28. Local convergence of HALSX The following property known for HALS (Kim, He, and Park 2014): Property For all R ∈ Rn1×n2 , y ∈ Rn2 + , y not identically zero, any vector x∗ that verifies x∗ ∈ arg min x∈Rn1 R − x(y)T 2 F , is also a solution to min x∈Rn1 R − x+(y)T 2 F . In HALSX, we can show the following similar property: Proposition Suppose that R ∈ Rn1×n2 , fc ∈ Rn2 + are not identically equal to zero, and g : Rd → Rn1 , with d ≥ n1, is a convex differentiable function. Suppose θ∗ ∈ arg min θ∈Rd R − g(θ)(fc)T 2 F . If gθ∗ , the Jacobian matrix of g at θ∗ , is of rank n1, then θ∗ is also a solution to min θ∈Rd R − (g(θ))+(fc)T 2 F . Then by an argument of strict quasi-convexity, we obtain the convergence result. 21/35
  29. 29. Outline Method: Nonnegative matrix factorization with side information NMF with linear measurements Time series recovery and prediction with side information HALSX algorithm Experiments Time series recovery Time series prediction Conclusions 22/35
  30. 30. Experimental setting Three datasets are used in experiments: Synthetic a rank-20 150-by-180 nonnegative matrix simulated following the generative model (n1 = 150, n2 = 180). Synthetic row and column variables. French daily consumption of 473 medium-voltage feeders near Lyon from 2010 to 2012 (n1 = 1096, n2 = 473). Row variables: daily temperature, calendar variables. Columns variables: the percentage of each type of clients (residential, professional, industrial, high-voltage clients). Portuguese daily consumption of 370 Portuguese clients from 2010 to 2014 (n1 = 1461, n2 = 369). Row variables: daily temperature, calendar variables. We generate measures by selecting a number of observation periods, either uniformly on the whole matrix (random), or periodically with a randomly chosen offset for each column (periodic). 23/35
  31. 31. Recovery or prediction To test the prediction on new individuals and new periods, temporal aggregates are generated on a number of observation periods over the upper-left matrix. An error metric (RRMSE, or ˆX − X F / X F ) is calculated on each of the four submatrices. 24/35
  32. 32. Profiles obtained with Portuguese dataset Using external factors, the obtained profiles present visible annual cycles. 25/35
  33. 33. Outline Method: Nonnegative matrix factorization with side information NMF with linear measurements Time series recovery and prediction with side information HALSX algorithm Experiments Time series recovery Time series prediction Conclusions 26/35
  34. 34. Results on time series recovery Synthetic Portuguese French periodicrandom 0.1 0.2 0.3 0.4 0.50.1 0.2 0.3 0.4 0.50.1 0.2 0.3 0.4 0.5 0.00 0.05 0.10 0.15 0.20 0.25 0.00 0.05 0.10 0.15 0.20 0.25 Sampling rate Recoveryerror algorithm empty_model softImpute HALS NeNMF HALSX_model HALSX regression lm gam gaussprRadial svmLinear Using exogenous variables (HALSX_model), the error rate on matrix recovery is in most cases equivalent or an improvement compared to NMF methods (NeNMF and HALS). With random observation dates on the synthetic dataset, which is arguably the least realistic case, HALSX_model is a little worse off than NeNMF and HALS. 27/35
  35. 35. Outline Method: Nonnegative matrix factorization with side information NMF with linear measurements Time series recovery and prediction with side information HALSX algorithm Experiments Time series recovery Time series prediction Conclusions 28/35
  36. 36. Results on time series prediction Row error Column error RC error periodicrandom 0.25 0.50 0.75 1.00 0.25 0.50 0.75 1.00 0.25 0.50 0.75 1.00 0.0 0.1 0.2 0.3 0.4 0.0 0.1 0.2 0.3 0.4 Sampling rate algorithm rrr trmf individual_gam factor_gam HALSX_model HALSX regression lm gam gaussprRadial svmLinear Figure: Prediction error on synthetic data Row error Column error RC error periodicrandom 0.25 0.50 0.75 1.00 0.25 0.50 0.75 1.00 0.25 0.50 0.75 1.00 0.20 0.25 0.30 0.35 0.40 0.20 0.25 0.30 0.35 0.40 Sampling rate algorithm rrr trmf individual_gam factor_gam HALSX_model HALSX regression lm gam gaussprRadial svmLinear Figure: Prediction error on French data On synthetic data, the error of prediction is rather low for the three prediction types (around 10%), which is remarkable since only very partial data was available in the first place. On the real-world datasets, the prediction error is higher. However, HALSX still outperforms other benchmark methods. HALSX is not sensitive to the sampling rate. 29/35
  37. 37. Outline Method: Nonnegative matrix factorization with side information NMF with linear measurements Time series recovery and prediction with side information HALSX algorithm Experiments Time series recovery Time series prediction Conclusions 30/35
  38. 38. Conclusions In this talk we formalized the temporal aggregate observations in electricity consumption as a trace regression model; proposed a generative low-rank matrix model to introduce side information in NMF; deduced HALSX, an algorithm to solve the new NMF problem; tested it on real and synthetic electricity datasets and obtained results that are equivalent or better than reference methods. The proposed method is implemented in an R package used internally at EDF. 31/35
  39. 39. Perspectives of the thesis Industrial applications Instead of estimating the whole time series, NMF can be used to directly or indirectly estimate important statistics, such as the peak demand. Methodological perspectives Estimation with both spatial and temporal aggregates Usage of social network data as column variables Causal relationship between the presence of the measures and the data that are measured Neural network/deep learning with partial data Theoretical perspectives Is it possible to achieve global convergence of first-order NMF algorithms in special cases? 32/35
  40. 40. References Jacob Abernethy et al. “A New Approach to Collaborative Filtering: Operator Estimation with Spectral Regularization”. In: The Journal of Machine Learning Research 10 (2009), pp. 803–826. Emmanuel J. Candès and Benjamin Recht. “Exact Matrix Completion via Convex Optimization”. In: Foundations of Computational Mathematics 9.6 (2009), pp. 717–772. doi: 10.1007/s10208-009-9045-5. Yunmei Chen and Xiaojing Ye. “Projection Onto A Simplex”. In: arXiv preprint arXiv:1101.6081 (2011). Andrzej Cichocki, ed. Nonnegative Matrix and Tensor Factorizations: Applications to Exploratory Multi-Way Data Analysis and Blind Source Separation. Chichester, U.K: John Wiley, 2009. 477 pp. isbn: 978-0-470-74666-0. Rina Foygel et al. “Nonparametric Reduced Rank Regression”. In: Advances in Neural Information Processing Systems. 2012, pp. 1628–1636. Mehmet Gönen and Ethem Alpaydın. “Multiple Kernel Learning Algorithms”. In: Journal of Machine Learning Research 12 (Jul 2011), pp. 2211–2268. Naiyang Guan et al. “NeNMF: An Optimal Gradient Method for Nonnegative Matrix Factorization”. In: IEEE Transactions on Signal Processing 60.6 (2012), pp. 2882–2898. doi: 10.1109/TSP.2012.2190406. Jingu Kim, Yunlong He, and Haesun Park. “Algorithms for Nonnegative Matrix and Tensor Factorizations: A Unified View Based on Block Coordinate Descent Framework”. In: Journal of Global Optimization 58.2 (2014), pp. 285–319. Daniel D. Lee and H. Sebastian Seung. “Learning the Parts of Objects by Non-Negative Matrix Factorization”. In: Nature 401.6755 (1999), pp. 788–791. Ashin Mukherjee and Ji Zhu. “Reduced Rank Ridge Regression and Its Kernel Extensions”. In: Statistical Analysis and Data Mining 4.6 (Dec. 2011), pp. 612–622. doi: 10.1002/sam.10138. pmid: 22993641. Benjamin Recht, Maryam Fazel, and Pablo A. Parrilo. “Guaranteed Minimum-Rank Solutions of Linear Matrix Equations via Nuclear Norm Minimization”. In: SIAM review 52.3 (2010), pp. 471–501. Angelika Rohde and Alexandre B. Tsybakov. “Estimation of High-Dimensional Low-Rank Matrices”. In: The Annals of Statistics 39.2 (2011), pp. 887–930. doi: 10.1214/10-AOS860. Or Zuk and Avishai Wagner. “Low-Rank Matrix Recovery from Row-and-Column Affine Measurements”. In: Proceedings of The 32nd International Conference on Machine Learning. Proceedings of The 32nd International Conference on Machine Learning. 2015, pp. 2012–2020. 33/35
  41. 41. How to calculate PA(X) For some forms of masks, there are efficient methods. Matrix completion: replacing the observed entries. Temporal aggregates: simplex projection min vId vId − t0(d)+h(d) t=t0(d)+1 (fr)t(FT c )nd 2 s.t. vId ≥ 0, vT Id 1 = ad. An efficient simplex proejction algorithm (Chen and Ye 2011) is used in this case. General case: iterate between V = V + A† (a − A(V)); vi,j = max(0, vi,j ). 34/35
  42. 42. Which functional spaces to choose HALSX is rather agnostic in the choice of regression models. There is a biais-variance trade-off between flexible models with many parameters and simple models with few parameters. Overfitting can be adressed by cross-validation at each model update. 35/35
  • weiyan29

    Apr. 8, 2019

Abstract : Motivated by the recovery and prediction of electricity consumption time series, we extend Nonnegative Matrix Factorization to take into account external features as side information. We consider general linear measurement settings, and propose a framework which models non-linear relationships between external features and the response variable. We extend previous theoretical results to obtain a sufficient condition on the identifiability of NMF with side information. Based on the classical Hierarchical Alternating Least Squares (HALS) algorithm, we propose a new algorithm (HALSX, or Hierarchical Alternating Least Squares with eXogeneous variables) which estimates NMF in this setting. The algorithm is validated on both simulated and real electricity consumption datasets as well as a recommendation system dataset, to show its performance in matrix recovery and prediction for new rows and columns.

Aufrufe

Aufrufe insgesamt

572

Auf Slideshare

0

Aus Einbettungen

0

Anzahl der Einbettungen

189

Befehle

Downloads

6

Geteilt

0

Kommentare

0

Likes

1

×