Next, we studied L1 and L2 regularization with linear models. Regularization helps to avoid overfitting, where your model performs very well with training data but poorly with test data (see also previous blog entry where convnet overfits). These regularization techniques add a penalty to the loss function (such as log-loss or hinge loss), and the penalty is either the ed L1 or L2 norm of the weight vector. Note also that the same technique applies with deep nets that are just stacked LogReg models. For example, in Keras, the regularizers module enables the use of penalty if your net seems to overfit. In DNN's, the most widely used regularizer is probably the dropout, which we have discussed earlier, but all of them serve the same purpose: avoid overfitting to training data.
There was a question on what tricks are the winning teams using to succeed in the competition. Among the reports due last Monday, I have spotted a few widely used ones. I don't want to spoil the competition by revealing them all, but here are a few general tricks:
- Model averaging: instead of using just one model, train many predictors (LR, SVM, RF, deep net), and average their predictions; either by majority vote (which class is predicted most often) or averaging the class probabilities. If you want to get fancy, you may want to try different combinations of predictors in your local test bench and automate the choice of predictors to include. I have done this in this very related competition (you can find my name on the leaderboard). Also check the "congratulations thread" of the discussion forum there.
- Semisupervised learning: You can learn from the test data as well with this one. Googling for "semi supervised logistic regression" will find many papers where the test data is used to aid the classification. Then there is the dirty trick we used in this competition (check the leaderboard here as well):
- Train a model with the training data
- Predict labels for the test data
- Append the training set with test samples and predicted labels
- Retrain the model with train + test data and with the fake labels
- Predict the labels for test data again
- The accuracy of the above usually improves if you include only those test samples whose probability is high (e.g. > 0.6).
I am happy to find this post very useful for me, as it contains lot of information. I always prefer to read the quality content and this thing I found in you post.
VastaaPoistaKissAnime alternatives