Approximation of a matrix with sub-matrices : Algorithms and Applications

Petros Drineas

Rensselaer Polytechnic Institute

Monday, April 3 at 1:15 in LOM 201

ABSTRACT
The Singular Value Decomposition (SVD) of matrices (and the related Principal Components Analysis) has found numerous applications in extracting structure from datasets in various domains, e.g., the social sciences, biology, chemistry, medicine, Internet data, etc. The extracted structure is expressed in terms of singular vectors, which are linear combinations of all the input data and lack an intuitive physical interpretation. In this talk we shall discuss a matrix decomposition of the form CUR which, given an m-by-n matrix A, approximates A by the product CUR, where C consists of a few columns of A, R consists of a few rows of A, and U is a carefully constructed, constant-sized matrix. Such decompositions might be easier to interpret in applications, since both C and R are subsets of the data. Previous matrix decompositions of this form only guaranteed additive error approximations. Our construction for C, U, and R takes cubic time and guarantees that the approximation is almost the best possible in the sense of relative error. Finally, we will discuss applications of this algorithm in the analysis of human genotype and microarray cancer data. Return to DMTCS home page.

ABSTRACT


The Singular Value Decomposition (SVD) of matrices (and the related
Principal Components Analysis) has found numerous applications in extracting
structure from datasets in various domains, e.g., the social sciences,
biology, chemistry, medicine, Internet data, etc. The extracted structure is
expressed in terms of singular vectors, which are linear combinations of all
the input data and lack an intuitive physical interpretation. In this
talk we shall discuss a matrix decomposition of the form CUR which, given an
m-by-n matrix A, approximates A by the product CUR, where C consists of a
few columns of A, R consists of a few rows of A, and U is a carefully
constructed, constant-sized matrix. Such decompositions might be easier to
interpret in applications, since both C and R are subsets of the data.
Previous matrix decompositions of this form only guaranteed additive error
approximations. Our construction for C, U, and R takes cubic
time and guarantees that the approximation is almost the best possible
in the sense of relative error.

Finally, we will discuss applications of this algorithm in the analysis of
human genotype and microarray cancer data.

Return to DMTCS home page.