Extensions of Non-Negative Matrix Factorization (NMF) to Higher Order Data

(1)

Extensions of Non-Negative Matrix Factorization (NMF) to Higher Order Data

HONMF

(Higher Order Non-negative Matrix Factorization)

NTF2D/SNTF2D

((Sparse) Non-negative Tensor Factor 2D Deconvolution)

Data results

The algorithms were used on a dataset containing the inter trial phase coherence (ITPC) of wavelet

transformed EEG data. Briefly stated the data consist of 14 subject recorded during a proprioceptive stimuli consisting of a weight change of left hand during odd trials and right hand during even trials giving a total of 14·2=28 trials. Consequently, the data has the following form X

Channel Time-Frequency  Trials

(Mørup et al. 2006a)

Informatics and Mathematical Modeling

Increasing attention has lately been given to Non-negative Matrix Factorization due to its part based representation and ease of algorithmic implementation (Lee & Seung, 1999 & 2001).

However, NMF is not in general unique – only when data adequately spans the positive orthant (Donoho and Stodden, 2004). Consequently, constraints in the form of sparsity is useful to achieve unique decompositions (Hoyer 2002,2004 Eggert & Körner 2004). As a result, algorithms for sparse coding using multiplicative updates have been derived

(Eggert & Körner 2004, Mørup & Scmidt 2006b)

Sparse Coding NMF:

Sparse Coding NMF regularizes H while keeping W normalizes such that regularization is not simply achieved by letting H go to zero while W goes to infinity (Eggert and Körner, 2004 Mørup & Schmidt 2006b). C_sparse(H) can be any function with positive derivative - a frequently used function is the 1-norm.

Title of Nature article on NMF

from 1999 NMF is based on gradient descent:

Each component is updated by a step in the negative gradient direction

NMF uses the concept of multiplicative updates:

The derivative of the cost function can be split into a positive part _i,d and a negative part _i,d. Choosing the step size as the ratio of W _i,d to the positive part of the derivative _i,d yield multiplicative updates since the gradient step then cancel the W_i,dterm in the gradient based update.

The resulting NMF updates:

The least squares (LS) and Kullback-Leibler (KL) divergence updates

derived from the multiplicative update approach (Lee & Seung, 2001). NMF not in general unique:

If the data does not adequately span the positive orthant no unique solution can be obtained. Here red and green vectors both perfectly span the data points. However, the green vectors represent the solution the most sparse.

NTF

(Non-negative Tensor Factorization) Model

NTF is based on the PARAFAC model (Harshman 1970, Carrol & Chang 1970, Fitzgerald et al., 2005) Model

The HONMF is based on the Tucker model (Tucker, 1977) where non-negativity is imposed on all modalities (Mørup et al. 2006e).

Model

The NTF2D is a PARAFAC model convolutive in 2 dimensions (Mørup & Schmidt 2006c):

Algorithms

Data results

The algorithms were used to analyze the absolute value of the log spectrogram of stereo recordings of music, i.e. the data had the form X

Channel Log-Frequency  Time

Data results

The algorithms were tested on a dataset of flow injection analysis (Nørgaard, 1994 Smilde, 1999), i.e. X

Spectre Time  Batch number

Result obtained by the SNTF2D algorithms (bottom panel) when decomposing the log-spectrogram of synthetically generated stereo music (middle panel) generated from the true components given in the top panel.

Decomposition result of a real stereo recording of music consisting of a Flute and Harp playing ”The Fog is Lifting” by Carl Nielsen. Scores given at the top. Clearly the SNTF2D separates the log-spectrogram into two components pertaining to the harp and flute respectively. By spectral masking of the log- spectrograms the two components are reconstructed revealing that the one component indeed pertains to the harp whereas the other pertains to the flute.

Morten Mørup,

Department of Signal Processing, Informatics and Mathematical Modeling, Technical University of Denmark, mm@imm.dtu.dk webpage: www.imm.dtu.dk/~mm

Parts of the above work done in collaboration with (see also references):

Lars Kai Hansen, Professor

Department of Signal Processing Informatics and Mathematical Modeling,

Technical University of Denmark

Mikkel N. Schmidt, Stud. PhD

Department of Signal Processing Informatics and Mathematical Modeling,

Technical University of Denmark

Mathematical notation:

Sidse M. Arnfred, Dr. Med. PhD

Cognitive Research Unit Hvidovre Hospital University Hospital of Copenhagen

References:

Carroll, J. D. and Chang, J. J. Analysis of individual differences in multidimensional scaling via an N-way generalization of "Eckart-Young"

decomposition, Psychometrika 35 1970 283--319

Eggert, J. and Korner, E. Sparse coding and NMF. In Neural Networks volume 4, pages 2529-2533, 2004

Eggert, J et al Transformation-invariant representation and nmf. In Neural Networks, volume 4 , pages 535-2539, 2004

Fiitzgerald, D. et al. Non-negative tensor factorization for sound source separation. In proceedings of Irish Signals and Systems Conference, 2005 FitzGerald, D. and Coyle, E. C Sound source separation using shifted non.-negative tensor factorization. In ICASSP2006, 2006

Fitzgerald, D et al. Shifted non-negative matrix factorization for sound source separation. In Proceedings of the IEEE conference on Statistics in Signal Processing. 2005

Harshman, R. A. Foundations of the PARAFAC procedure: Models and conditions for an "explanatory" multi-modal factor analysis},UCLA Working Papers in Phonetics 16 1970 1—84

Lathauwer, Lieven De and Moor, Bart De and Vandewalle, Joos MULTILINEAR SINGULAR VALUE DECOMPOSITION.SIAM J. MATRIX ANAL. APPL.2000 (21)1253–1278

Lee, D.D. and Seung, H.S. Algorithms for non-negative matrix factorization. In NIPS, pages 556-462, 2000 Lee, D.D and Seung, H.S. Learning the parts of objects by non-negative matrix factorization, NATURE 1999

Mørup, M. and Hansen, L.K.and Arnfred, S.M.Decomposing the time-frequency representation of EEG using nonnegative matrix and multi-way factorization Technical report, Institute for Mathematical Modeling, Technical University of Denmark, 2006a

Mørup, M. and Schmidt, M.N. Sparse non-negative matrix factor 2-D deconvolution. Technical report, Institute for Mathematical Modeling, Tehcnical University of Denmark, 2006b

Mørup, M and Schmidt, M.N. Non-negative Tensor Factor 2D Deconvolution for multi-channel time-frequency analysis. Technical report, Institute for Mathematical Modeling, Technical University of Denmark, 2006c

Schmidt, M.N. and Mørup, M. Non-negative matrix factor 2D deconvolution for blind single channel source separation. In ICA2006, pages 700- 707, 2006d

Mørup, M. and Hansen, L.K.and Arnfred, S.M. Algorithms for Sparse Higher Order Non-negative Matrix Factorization (HONMF), Technical report, Institute for Mathematical Modeling, Technical University of Denmark, 2006e

Nørgaard, L and Ridder, C.Rank annihilation factor analysis applied to flow injection analysis with photodiode-array detection Chemometrics and Intelligent Laboratory Systems 1994 (23) 107-114

Schmidt, M.N. and Mørup, M. Sparse Non-negative Matrix Factor 2-D Deconvolution for Automatic Transcription of Polyphonic Music, Technical report, Institute for Mathematical Modelling, Tehcnical University of Denmark, 2005

Smaragdis, P. Non-negative Matrix Factor deconvolution; Extraction of multiple sound sources from monophonic inputs. International Symposium on independent Component Analysis and Blind Source Separation (ICA)W

Smilde, Age K. Smilde and Tauller, Roma and Saurina, Javier and Bro, Rasmus, Calibration methods for complex second-order data Analytica Chimica Acta 1999 237-251

Tamara G. Kolda Multilinear operators for higher-order decompositions technical report Sandia national laboratory 2006 SAND2006-2081.

Tucker, L. R. Some mathematical notes on three-mode factor analysis Psychometrika 31 1966 279—311 Welling, M. and Weber, M. Positive tensor factorization. Pattern Recogn. Lett. 2001

And also on the inter trial phase coherence (ITPC) of EEG data (see section on NTF for dataset details).

The NTF decomposition reveals a right parietal activity mainly present during odd trials corresponding to left hand stimuli as well as a more frontal and a higher frequent central parietal activity

While the HONMF is not unique when no sparseness is imposed, it becomes unique when imposing sparseness on the core. Here revealing that the appropriate model to the data is a PARAFAC model (Mørup et al., 2006e). Furthermore, the HONMF decomposition gives a more part based representation that is easier to interpret than the solution found by HOSVD (Lathauwer et al., 2000).

The HONMF with sparseness imposed on the core and third modality resulted in a very consistent decomposition of the flow injection data capturing unsupervised the true concentrations present in each batch (given by modality 3).

The PARAFAC model is a generalization of the factor analysis to higher orders, where the data is explained by an outer product of factor effects pertaining to each modality. To the right is given the general expression of the PARAFAC model for N-order tensors

S yn th et ic d at a T ru e st er eo m u si c

Three equivalent ways of stating the Tucker model. The Tucker model accounts for all possible linear interactions between the factor effects pertaining to each modality.

Table giving how to update when imposing sparseness/normalizing the various modalities of the model

Updates for the NTF2D - by including updates marked in gray sparseness is imposed on H forming the SNTf2D.