Today I will be talking about the third project: A Recurrent Neural Network using LSTM (Long Short Term Memory) cells used to generate text.
The principle will be the same as text prediction: Given some words, it will give us a list of possible words to put after. To create this model, we use a dataset containing lines from The Simpsons, from scenes happening at Moe’s Tavern, and we use it to generate a new scene.
Recurrent Neural Networks are neural networks that keep some information from the previous input, as opposed to a “plain” neural network that only take in consideration the current input.
To give an example, the Open AI team uses LSTMs to train their bots to play DOTA. You can use RNN for speech recognition, text prediction or what to watch next on Netflix.
At the beginning of this part of the course, we reviewed plain old neural networks, feed-forward and back propagation, in order to understand how RNN work and what makes them different.
Each part of the program is introduced by a different teacher. This gives an interesting touch to the program, since everyone has a slightly different way of teaching, so even hearing the same again, because it is explained by someone new, helps.
This part also includes a lesson about hyperparameter optimization. This was really useful and provided many tips on how to chose the right parameters, from the network depth to the number of epochs.
Project: Generating TV Scripts
What you will implement is very similar to all the previous exercises on RNNs. This time, the Jupyter Notebook also provides unit tests, which will help you A LOT to be sure you are not making big mistakes.
Still, the unit tests can’t cover all cases. The first issue I had was that my architecture was not performing as good as expected. My loss was over 4.0 and I was expected to get something smaller than 1.0. It took me a while to figure out that I was leaving a parameter by default rather than specifying it to be
None and then my loss went down to 0.046!
I feel that such small mistakes are really easy to do, and almost impossible to know if you don’t have a lot of practice.
Again, the best tip I can give you is to keep the previous examples at hand.
This project required me to use TensorFlow directly. Not Keras or another higher level library. That makes things a bit more complicated. But learning the bases is important to know how to use the higher level APIs.
You can see the results on GitHub:
And this is an extract of my results:
homer_simpson:(reading)” can i borrow a feeling?"(laughs)” can i borrow a feeling?"(still laughing) there's your picture on the front.
kirk_van_houten: go ahead, homer. laugh at me.
homer_simpson: i already did… this bar's only for real americans. and people on permanent visas, like me. what? what are you all lookin’ at? i'm dutch. eh, forget all of you…
kent_brockman: so that you can just waltz.
homer_simpson: but i was just tellin’ all the guys how losing the power of speech made me a better man.
lindsay_naegle: i couldn't agree more. you're today's modern, enlightened man– the kind we television producers have been booking since the mid-seventies.
homer_simpson: hey, moe, if you're tired of bein’ an eyesore, why not get some plastic surgery?
moe_szyslak: oh, oh, hey, hey. no outside suds!
homer_simpson: i'm sorry, moe. marge won't let me in?
moe_szyslak: yeah, would you
Next project is about Generative Adversarial Networks, in which I will get to build a network that generates faces!