LSTM Generator - OKStateACM/AI_Workshop GitHub Wiki

LSTM Generator

LSTM stands for "Long Short-Term Memory." It is a type of recurrent neural network. A Long Short-Term Memory Recurrent Neural Network (LSTM RNN) was recently used to write a screenplay, which was then directed, acted, produced, and submitted to a film festival. Although you don't have to, it might be worth watching.

Article on Sunspring, a sci-fi short film

Recurrent Neural Networks

The idea behind a recurrent neural network is if you took the output of a node at one step, and used it as part of the input in the next step. Where a classic fully-connected network might look like this:

classic network

A recurrent network might look like this: (note the arrows coming out of the last output node)

Recurrent neural networks are a large class of neural networks.

LSTM model

LSTM unit

The LSTM model uses a neuron unit that is able to retain its state over time, as well as reset that state depending on the input signals and trainable parameter weights. This allows it to keep track of information over a period of time.

The power of LSTM is used for many problems that are continuous over time. Problems include speech recognition and dictation, handwriting recognition by analyzing individual strokes over time rather than pixel images, and pretty much anything that varies in time. LSTM's memory cells provide a way for information in the past to affect the present. For instance, in speech recognition, it can be difficult to classify the raw audio signal into individual "phonemes" or sounds. Oftentimes, the sounds of two words may be very similar, however the surrounding context may help to determine the correct word or sound. We do this all the time in conversation, using what has just been said to help us figure out what is being said, as some words and sounds are more likely to follow others. An LSTM network can be trained to utilize those associations through the same gradient descent method that we have already discussed, although there are modifications to deal with gradients over time, such as the full backpropagation through time algorithm.

Given this powerful tool, we can also process text or musical notes as a signal that varies in time. For music, as time progresses, the note changes. For text, we can think of starting at the first word or character, and as 'time' progresses, more text is read sequentially, and the words change. In this way we can also use LSTM and recurrent networks to work with sequences of text, words, or anything really.

With the MNIST data set, we were interested in classifying a given input, using a supervised learning system where all of our inputs were already labeled with the "correct" value. The goal of the network was to produce the correct classification for any given input. In this example, our network will be fed some text, and the goal will be to predict what comes next. After every prediction, we can feed that back into the network, and continue asking the network to produce more predictions. In this way, we will use an AI neural network to generate new text that is similar to the text that it was trained on. If the network has only ever seen ranting forum posts, it will predict text that is similar to ranting forum posts. In our example, we will use a book from the Gutenberg free book repository.

Code

This code is adapted from TFLearn's own documentation examples. Before you get started, you will need a corpus of input data. I chose to use "Grimms' Fairy Tales", downloaded from project Gutenberg. You can find another book, or input source if you'd like.

As for the code, first, import tflearn and the modules we will use.

import tflearn
from tflearn.data_utils import *

Next, indicate the path to the input file. We also will need to know how many characters will be in a 'sequence'. The network will work with chunks of text while it trains. This maxlen variable will be used to set the size of those chunks.

path = "corpus.txt" #path to the input file
maxlen = 100

And of course, load the data using tflearn's helpful utilities. This will process the large textfile into a list of sequences (which are just vectors) which will be used as the actual training input. Note that char_idx is a dictionary of all the charcters that appeared in the input text. Our network should only output these characters, so we will hold on to it for now.

X, Y, char_idx = \
    textfile_to_semi_redundant_sequences(path, seq_maxlen=maxlen, redun_step=2)

Next, build the ANN model using LSTM layers, and [dropout|] layers

input = tflearn.input_data([None, maxlen, len(char_idx)])   # input is a string of *maxlen* characters
lstm1 = tflearn.lstm(input, 256, return_seq=True)           # LSTM layer
dropout1 = tflearn.dropout(lstm1, 0.5)                      # dropout to avoid overfitting
lstm2 = tflearn.lstm(dropout1, 256)                         # LSTM layer
dropout2 = tflearn.dropout(lstm2, 0.5)                      # droupout to avoid overfitting
output = tflearn.fully_connected(dropout2, len(char_idx), activation='softmax')
optimizer = tflearn.regression(output, optimizer='adam', loss='categorical_crossentropy',
            learning_rate=0.001)

In previous examples, we used tflearn to train a DNN (Deep Neural Network), we want it to use it differently. tflearn has a utility for sequence generation, so we will use that rather than the generic DNN model. Here we specify the dictionary of charcters from before.

# Use TFlearn's sequence generator
model = tflearn.SequenceGenerator(optimizer, dictionary=char_idx,
        seq_maxlen=maxlen,
        clip_gradients=5.0,
        checkpoint_path='guten')

We now train the network. This will take some time, on the order of several hours. To speed things up, a pretrained model is provided. Download and load the pretrained model from github. note that csx does not work with the pretrained model. sorry!

model.load('guten-model')

A note about the 'temperature' setting for generating text. The low the temperature, the more likely it is for the generator to pick the "most likely" value out of the network. In cases like the MNIST task, this is exactly what we wanted. We wanted the most likely class from the network. However, with a genertive task like this, that can lead to extremely redundant text. Either it can get stuck in a loop, or it can start completely copying word for word from its training data.

# and train!
for i in range(50):   #train for 50 epochs, but print some output inbetween each one
    seed = random_sequence_from_textfile(path, maxlen) #start with a random prompt from the input text
    print("-- TESTING...")
    print("-- Test with temperature of 1.0 --")
    print(model.generate(600, temperature=1.0, seq_seed=seed)) #generate 600 characters with a lot of diversity
    print("-- Test with temperature of 0.5 --")
    print(model.generate(600, temperature=0.5, seq_seed=seed)) # generate 600 characters with less diversity
    print("-- Test with temperature of 0.1 --")
    print(model.generate(600, temperature=0.1, seq_seed=seed)) # generate 600 characters with much less diversity
    model.fit(X, Y, validation_set=0.2, batch_size=128,        # finally, train for a while (1 epoch - looking at *every* sequence generated from the input once)
            n_epoch=1, run_id='guten')

And start generating text! After training on CSX overnight, here's what my generator spit out (note the first 100 characters are the prompt from the input text, not generated):

-- Test with temperature of 0.5 --
not I,’ and went to another door; but when the
people heard the jingling of the bells they would not was all the hand to she had so her hand, he spranged her of the could
be your hand of him the hight away the wable but her to her for him all the sheat again the scame of child homet
o the trae
there to the nore and but the fox, and the wife as he said, ‘I should so no said, ‘When the butted to the father
 as he tore the king to the carreated the windon the said, ‘There they had the tanking she said, ‘What was a man
 a put him to you the fing on the shall a shall with her hand and tree in the could to be down the find on the w
all in the man and the sroper they was beautiful of the hand and the

Here are some results generated by someone else, using the same technique, trained on Shakespeare quotes:

(source here)

VIOLA:
Why, Salisbury must find his flesh and thought
That which I am not aps, not a man and in fire,
To show the reining of the raven and the wars
To grace my hand reproach within, and not a fair are hand,
That Caesar and my goodly father's world;
When I was heaven of presence and our fleets,
We spare with hours, but cut thy council I am great,
Murdered and by thy master's ready there
My power to give thee but so much as hell:
Some service in the noble bondman here,
Would show him to her wine.

KING LEAR:
O, if you were a feeble sight, the courtesy of your law,
Your sight and several breath, will wear the gods
With his heads, and my hands are wonder'd at the deeds,
So drop upon your lordship's head, and your opinion
Shall be against your honour.

Analysis

I think it's important now to think about what we've done here. Without providing any explicit information about English grammar, word choice, or format, a computer program has produced results that are passable English text (or, it might have if we trained it for much longer). However, it seems to be gibberish. There are no coherent thoughts, or ideas. It simply doesn't make sense, even though at first glance, it does appear similar to Shakespeare's writing.

However, the feat of being able to follow English grammar rules relatively successfully is a testament to the power of modern AI methods. Natural language processing has been a particularly tricky subject for computers until recently. Attempts at hardcoding every rule and syntax of our grammar have led to relatively low success rates for a vast amount of effort. With a neural network, almost no human effort was used, and a relatively small amount of CPU time was used (a few hours on CSX).

Another interesting thing to note is what lies in the actual generated content. Here is an in-depth movie review on Sunspring, the AI-produced screenplay. Like our Shakespeare quotes, the short film lacks real coherence and meaning, yet the analysis is strikingly thoughtful. What is the takeaway from this? Is there some subtly vague meaning or understanding within this seemingly empty computer program? Or are we just that good at projecting our own thoughts and values on anything and everything? While this may be a simple example, there is a lot to think about here.