experimental results - davidar/scholarpedia GitHub Wiki
This page is in a revision process and thus may contain errors. Please revisit this page for a correct new version after Feb.28
First, experiments have been made with a large number of simulated data sets generated by varying <math>k, d, m_{\ell}, \Sigma_{\ell}=\Psi^2_{\ell}I \,\ .</math> The task aims at checking whether <math>k, \{ m_{\ell}\}\,</math> can be correctly selected. For comparison, we conduct the maximum likelihood (ML) learning by the EM algorithm for parameter learning, and then make model selection on <math>k, \{ m_{\ell}\}\,</math> by several typical criteria, including AIC and its modification CAIC, BIC or equivalently MDL, cross-validation (CV). Moreover, it is also intented to make a comparision with a VC-dimension based SRM error bound. After an extensive search of the existing literature, only one criterion has been found for selecting <math>k\,</math> on a Gaussian mixture (Wang & Feng, 2005), but there is no criterion available for local factor analysis. Via <math>q(x, \ell) \,</math> in <figref></figref>, we have been able to use the criterion in (Wang&Feng, 2005) for <math>k\,</math> but unable to determine the hidden factor number <math> m_{\ell} \,</math> for each Gaussian component. Furthermore, comparisons have also made with two algorithms that makes selection on <math>k, \{ m_{\ell}\} \,</math> incrementally such that the huge computing cost by a two stage implementation of ML + Criterion can be significantly saved. One is the variantional inference for Bayesian mixtures of factor analyser (VBMFA) (Ghahramani and Beal, 1999) and the other is named incremental mixture of factor analyzer (IMoFA) (Salah & Alpaydin, 2004).
In a correspondence to those criteria, we implement BYY-C, i.e., the BYY harmony learning via a two stage implementation by eq(<figref></figref>) together with eq(<figref></figref>), while in a correspondence to VBMFA and IMoFA, we implement BYY-A, i.e., the BYY learning with automatic model selection. Both performances and computing times are compared in experiments.
Given in <figref>Exp1-1.GIF</figref> are experimental results on three simulated data sets with samples of a small, medium, and large size, respectively. For those existing criteria plus VBMFA and IMoFA, it can be observed that no one is always best. Instead, some performs better in one case while some other performs better in another case. Interestingly, BYY-C considerably outperforms all these criteria as well as VBMFA, IMoFA, and BYY-A, and BYY-A outperforms its counterparts VBMFA and IMoFA, while VBMFA and IMoFA perform quite similarly. Moreover, the computing times used by BYY-A and IMoFA are similar but both are only <math>3\%-30\% \,</math> of the computing times by two stage implementation based criteria and BYY-C. Though being inferior to BYY-C, BYY-A is still better or comparable to the one that performs best among the criteria as well as VBMFA and IMoFA, with a considerable computing cost saving. Again, the observations can be consistently obtained from <figref>Exp1-1.GIF</figref> with comparisons made on 27 data sets.
Second, experiments also have been made on a number of real world data sets for pattern recognition tasks. On these data sets, it is hard to directly check whether <math>k, \{ m_{\ell}\} \,</math> are appropriate. Instead, what we can directly compare with are the average classification rates on the testing sets. Shown in <figref>Exp2.gif</figref> are the comparison results on several widely used data sets. In favor of saving computing cost for a real application purpose, we only take BYY-A to compare with other approaches. Again, it can be observed that BYY-A outperforms the others in most cases, with a computing time similar to IMoFA but <math>3\%-30\% \,</math> of the computing times needed by those two stage implemented criteria.
The above experiments are all made by Mr. Lei Shi. Readers are referred to Web-link II for further details and also experiments on other data sets as well as the applications on two widely used handwritten digits databases MNIST and CEDAR.
Readers are referred to (Xu, 1995, 2000, 2001a&b, 2002, 2003, 2004b&c, 2005, 2007a&b) for details and overviews, as well as results on a number of typical learning tasks, with some of them listed as follows:
- Cluster analysis, Gaussian mixture, and mixture of shape-structures (including lines, planes, curves, surfaces, and even complicated shapes).
- Factor analysis (FA) and local FA, including PCA, subspace analysis and local subspaces, etc.
- Independent subspace analysis, including independence components analysis (ICA), binary factor analysis (BFA), nonGaussian factor analysis (NFA), and LMSER, as well as three layer net.
- Independent state space analysis, including temporal factor analysis (TFA), independent hidden Markov model (HMM), temporal LMSER, and variants.
- Combination of multiple inferences, including multiple classifier combination, RBF nets, mixture of experts, etc.
- Shi, L (2008), Bayesian Ying-Yang harmony learning for local factor analysis: a comparative investigation, In Tizhoosh & Ventresca (eds), Oppositional Concepts in Computational Intelligence, Springer-Verlag, 209-232.
- Salah, A & Alpaydin, E (2004), "Incremental mixtures of factor analysers", Proc.17th Intl Conf. on Pattern Recognition, vol.1, 276-279.
- Sun, K, Tu, SK, Gao, DY, & Xu, L (2009), Canonical Dual Approach to Binary Factor Analysis, To appear in Proc. 8th International Conf on Independent Component Analysis and Signal Separation, ICA 2009, Paraty, Brazil, March 15-18, 2009.
- Wang, L & Feng, J (2005), "Learning Gaussian mixture models by structural risk minimization", "Proc. 4th Intl Conf. Machine Learning and Cybernetics", 18-21 August, 2005, Guangzhou, China, Vol. 8, 4858-4863.