RF Wins Figure - Michael-Ainsworth/progressive-learning GitHub Wiki
To demonstrate this relationship, we generated simulation data using the sparse parity distribution. Sparse parity is a p-dimensional binary classification problem that generalizes the noisy XOR distribution. Data is generated from a p-dimensional feature vector, where each X1,...,Xp ~iid U(-1,1). A parameter p* represents the number of informative dimensions, where p* < p. Class label Y = 0 if there are an even number of positive values among the first p* < p dimensions, and Y = 1 if not. In this experiment, let p = 14 and p* = 3. The parametric DN uses a single hidden layer with 4 nodes, and mini-batch sampling of size 3. The figure shows classification performance L(g) against the number of training samples n.
Figure 3b: Performance comparison of a DF and a DN on the sparse parity distribution: DF in red; DN in blue. Sparse parity uses parameters p = 14 and p* = 3. The DN has a specific parametric architecture using 1 hidden layer with 4 nodes. The y-axis is training set sample size n; the y-axis is classification error L(g).
We can see that at large enough sample sizes, DF produces a lower classification error than DN.