The recurrent networks differ from ordinary nets in that they remember their past states. This makes them attractive for sequence processing (e.g., audio signals or text sequences).
An example of a LSTM network in action is shown in the picture, which is originally from our recent paper. In that case, we detected different audio classes from microphone signal via transformation to spectrogram representation.
It is possible to take individual spectrogram timeframes and recognize them independently. However, this usually creates jitter in the event roll (classes turning on and off). Since the LSTM remebers the past, it produces a much smoother output.
For more information on the theory of recurrent nets, I recommend to study the already classical blog entry of Andrej Karpathy: The unreasonable effectiveness of recurrent neural networks.
At the end of the second hour, we looked at examples of recent highlights in deep learning, and will continue on that on Wednesday.
We also tested how well the recurrent network suits to our competition task. The below code is an example how to do this.
# Load and binarize the data path = "audio_data" X_train, y_train, X_test, y_test, class_names = load_data(path) lb = LabelBinarizer() lb.fit(y_train) y_train = lb.transform(y_train) y_test = lb.transform(y_test) num_classes = y_train.shape[1] # Shuffle dimensions. # Originally the dims are (sample_idx, frequency, time) # Should be like: (sample_idx, time, frequency), so that # LSTM runs along 'time' dimension. X_train = np.transpose(X_train, (0, 2, 1)) X_test = np.transpose(X_test, (0, 2, 1)) # Define the net. Just 1 LSTM layer; you can add more. model = Sequential() model.add(LSTM(128, return_sequences = False, input_shape = X_train.shape[1:])) model.add(Dense(num_classes, activation = 'softmax')) # Define optimizer such that we can adjust the learning rate: optimizer = SGD(lr = 1e-3) # Compile and fit model.compile(optimizer = optimizer, loss = 'categorical_crossentropy', metrics = ['accuracy']) history = model.fit(X_train, y_train, epochs = 30,
validation_data = (X_test, y_test)) # Plot learning curve. # History object contains the metrics for each epoch. # If you want to use plotting on remote machine, import like this: # # > import matplotlib # > matplotlib.use("PDF") # Prevents crashing due to no window manager # > import matplotlib.pyplot as plt fig, ax = plt.subplots(2, 1) ax[0].plot(history.history['acc'], 'ro-', label = "Train Accuracy") ax[0].plot(history.history['val_acc'], 'go-', label = "Test Accuracy") ax[0].set_xlabel("Epoch") ax[0].set_ylabel("Accuracy / %") ax[0].legend(loc = "best") ax[0].grid('on') ax[1].plot(history.history['loss'], 'ro-', label = "Train Loss") ax[1].plot(history.history['val_loss'], 'go-', label = "Test Loss") ax[1].set_xlabel("Epoch") ax[1].set_ylabel("Loss") ax[1].legend(loc = "best") ax[1].grid('on') plt.tight_layout() plt.savefig("Accuracy.pdf", bbox_inches = "tight")
The learning curve that the script produces is shown on the right. It seems that the accuracy is growing but relatively slowly. This could be improved by adding 1-2 more LSTM layers and making the bigger (e.g., 128 -> 512 nodes).
It is noteworthy that the speed on a CPU is very slow. Here are some numbers:
- CPU: about 30 minutes per epoch.
- Tesla K80 GPU: about 80 seconds per epoch.
- Tesla K80 GPU with CuDNN: about 11 seconds per epoch. Simply substitute "LSTM" by "CuDNNLSTM" in the above code.
- Tesla K80 is about 3x slower than a high end gaming GPU, so with GTX1080Ti you might reach 5 s / epoch.
If we add another recurrent layer, change LSTM to GRU, and increase the layer size to 256, we get a learning curve as shown on the right.
Additionally, below is an example of setting up a convnet for the competition.
X_train = X_train[..., np.newaxis] X_test = X_test [..., np.newaxis] model = Sequential() model.add(Conv2D(filters = 32, kernel_size = (5, 5), activation = 'relu', padding = 'same', input_shape=(40, 501, 1))) model.add(MaxPool2D(2,2)) model.add(Conv2D(filters = 32, kernel_size = (5, 5), padding = 'same', activation = 'relu')) model.add(MaxPool2D(2,2)) model.add(Conv2D(filters = 32, kernel_size = (5, 5), padding = 'same', activation = 'relu')) model.add(MaxPool2D(2,2)) model.add(Conv2D(filters = 32, kernel_size = (5, 5), padding = 'same', activation = 'relu')) model.add(MaxPool2D(2,2)) model.add(Flatten()) model.add(Dense(128, activation = 'relu')) model.add(Dense(y_train.shape[1], activation='softmax')) model.summary() optimizer = SGD(lr = 1e-3) model.compile(optimizer = optimizer, loss = 'categorical_crossentropy', metrics = ['accuracy'])
This network reaches about 60% accuracy on the test set. Notice the clear sign of overlearning on the test set (the test loss actually increases after about 40 epochs.
Although none of the example networks is magnificent in accuracy, they have a lot of potential if tuned in an appropriate manner.
Deep learning is not plug and play.
Ei kommentteja:
Lähetä kommentti