Fast, Scalable and Geo-Distributed PCA for Big Data Analytics

Published in Information Systems, 2021

We propose TallnWide, a fast, scalable geo-distributed principal component analysis (PCA) algorithm that efficiently computes PCA on extremely high-dimensional data without intermediate memory overflow by dividing blocks of data and minimizing communication overhead. The method handles $\mathbf{10\times}$ higher dimensionality with $\mathbf{2.9\times}$ less time compared to baselines, enabling practical deployment in distributed big data settings.

Recommended citation: Adnan TMT, Tanjim MM, Adnan MA. Fast, Scalable and Geo-Distributed PCA for Big Data Analytics. Information Systems. 2021;98:101710.
Download Paper