However, before that we discussed the problem of model comparison in the competition. Namely, the train_test_split will mix samples from same recording into train and validation sets and will give an overly optimistic 99% score. Here is a great post on the topic.
Random forest have the ability of estimating the importance of each feature. This is done by randomizing the features one at the time and observing the amount of degradation of accuracy. If a feature is important, then its scrambling drops the accuracy a lot, while scrambling a non-important feature has only a minor effect to performance. In the lecture, we looked at an example, where we estimated the feature importances for the Kaggle competition data:
from sklearn.metrics import accuracy_score from sklearn.ensemble import RandomForestClassifier, ExtraTreesClassifier from utils import load_data, extract_features import matplotlib.pyplot as plt if __name__ == "__main__": path = "audio_data" X_train, y_train, X_test, y_test, class_names = load_data(path) # Axes: # 2 = average over time # 1 = average over frequency # None = average over both and concatenate F_train = extract_features(X_train, axis = None) F_test = extract_features(X_test, axis = None) # Train random forest model and evaluate accuracy #model = RandomForestClassifier(n_estimators = 100) model = ExtraTreesClassifier(n_estimators = 100) model.fit(F_train, y_train) y_pred = model.predict(F_test) acc = accuracy_score(y_test, y_pred) print("Accuracy %.2f %%" % (100.0 * acc)) # Plot feature importances importances = model.feature_importances_ plt.bar(range(len(importances)), importances) plt.title("Feature importances")
The resulting feature importances are plotted on the below graph. Here, the first 501 feature are the frequency-wise averages for each 501 timepoint. The last 40 features are the corresponding averages along the time axis. It can be clearly seen, that the time-averages are clearly more significant for prediction accuracy.
After the RF part, we briefly looked at other ensembles: AdaBoost, GradientBoosing and Extremely Randomized trees. We also mentioned xgboost, which has often been a winner in Kaggle competitions.
At the second hour, we started the neural network part of the course. The nets are based on mimicking human brain functionality, although they have diverted quite far from the original idea. Among the topics, we discussed the differences between 1990's nets and modern nets (more layers, more data, more exotic layers). At the end of the lecture, we looked at how Keras can be used for training a simple 2-layer dense network.
Nice Post..
VastaaPoistaVisit Best Machine Learning Training in Jaipur
Thanks for sharing artile about Machine Learning Solutions Provider
VastaaPoistaMachine Learning Solutions Provider
Nice post . Keep updating Artificial intelligence Online Trining
VastaaPoistaGreat tips, many thanks for sharing. I have printed and will stick on the wall! I like this blog. Machine Learning Training In Jaipur
VastaaPoistaLearn the latest technology in Artificial Intelligence & Machine Learning in real-time with the top Machine Learning Training in Hyderabad program offered by AI Patasala.
VastaaPoistaMachine Learning Training in Hyderabad with Placements