Ideas for methodology(Especially PCA Isomap) - AAU-Dat/P5-Nonlinear-Dimensionality-Reduction GitHub Wiki

Motivation

How do we know wether a DR is good or not? One way to evaluate the method through a machine learning model, and get the accuracy of the model as an indicator of how good the DR method is. It could be also relevant to consider much time it takes to perform the DR method.

A factor, which has tried to formalize the efficiency of a DR method, is Kruskal's stress. Kruskal's stress evaluates based on how good the method preserves the dissimilarities between data samples(before and after embedding), which means that the methods that distort the dataset are considered bad (1). An example of distortion in the case of Isomap is given in page 54 in (1).

We can try to test the dev set with different amounts of (samples x components) to show that Isomap performs worse than PCA on MNSIT. Ref (1) says that Isomap may perform poorly on real-world data sets as opposed to artificial sets such as the Swiss Roll. Ray has tried on the raw data set, yielding usually about 30% accuracy given the first 10000 samples. On 20000 samples the program crashed. It means that either the computer is bad, or that there is a need for more pre-processing/optimizations in order to use the whole dataset.

Ref: (1) https://repositorio.ufscar.br/bitstream/handle/ufscar/14806/final-report-lucas-david-a-study-of-the-isomap-algorithm-and-its-applications-in-machine-learning.pdf?sequence=1

Bibtex: @article{david2015study, title={A Study of the ISOMAP Algorithm and Its Applications in Machine Learning}, author={David, Lucas Oliveira}, year={2015}, publisher={Universidade Federal de S{~a}o Carlos} }