maanantai 19. helmikuuta 2018

Regularization and feature selection (Feb 19)

We continued the discussion of regularization, and noted that the L1 norm enforces weights that are exactly zero. This is explained by the figure on the right, where the regularized solutions are always restricted to a region defined by the norm and its upper bound. Because of the shape of the L1 norm region (red square), the minimum tends to occur at one of the corners. And the smaller the value of C, the more likely we are to hit one of the corners.

Apart from the L1 and L2 norms, there is literature on other norms. Especially Lq for q < 1 is interesting, because it results in even sparser weight vectors. Unfortunately, such norms create non-convex loss functions, and the solutions may be hard to find (endpoint depends on the initial guess). An example of regions obtained this way is shown on the right, and now it is even more straightforward to hit a corner (and have zero coefficients). Image credits: Caron et al., "Sparse bayesian nonparametric regression", ICML 2008.

The feature selection property is useful since discarding non-useful features improves the accuracy. On the other hand, it may be used together with a feature generator that generates a large amount of redundant features. An example of this was our ICANN2011 MEG Mind Reading Competition submission. In this challenge, we generated altogether 11 different feature sets (statistics such as means, stddevs, variances,...), and selected the best feature set using both  L1 norm and a brute force search over all combinations of two statistics of the 11 (more details here; available from TUT network).

For the competition, you may also check the MLSP2013 Birds competition, which is very related to the current competition in terms of data. A summary paper was written about the solutions different teams were using (available from TUT network).

Ei kommentteja:

Lähetä kommentti

Prep for exam, visiting lectures (Feb 22)

On the last lecture, we spent the first 30 minutes on an old exam. In particular, we learned how to calculate the ROC and AUC for a test sam...