FUN WITH RECURRENT NEURAL NETWORKS:

Training a Recurrent Neural Network on Combined Datasets by Transfer Learning

(If you aren't already familiar with recurrent neural networks, why not see Andrej Karpathy's excellent blog?)

These days, thanks to The Wonders of Science[TM], we can train neural networks to imitate different styles of text by showing them some examples. Often the results are gibberish, but occasionally in this gibberish there is a nugget of... less gibberish. There are many fine Python libraries out there to let one run RNN experiments: I am using textgenrnn, and fine-tuning its stock model on data of my own whimsical fancy. Here is a selection of the most interesting, perplexing, or otherwise notable outputs.

Whenever one has two good things, the natural human impulse is to smoosh them together: peanut butter and jelly, cheese and cake, jam and wasp. So too with neural networks! If a neural network can generate text based on television episode titles, and separately based on grandolinquent poppycock and balderdashery, why not train it on both together and hope it smooshes them together into an unholy gestalt of awesomeness.

Would that it were so simple.

When a char-rnn generates text, it chooses each character based on the characters that have come before, from the characters that were likely to follow those characters in the training data you showed it. If some characters usually follow others in the training set, that same pattern will often repeat in the trained network. If you train it on a bunch of examples of pattern A and a bunch of examples of pattern B, and ask the network to generate some text, some of the time it'll generate something like pattern A, and some of the time it'll generate something like pattern B. If the two patterns are different enough, it may never learn to generate examples that blend the two.

But instead of just blindly smooshing our datasets together like rocky and road, we can use the technique of transfer learning. This is a very popular strategy in machine learning, where training data is at a premium. To train a neural network up from scratch can take an enormous amount of training data, which needs to be cleaned, curated, annotated, brushed, and washed. Who has that kind of time? Instead, practitioners perform transfer learning:

Gather just a small amount of data of your own. Then go out and find a neural network that has already been trained on similar data, the more similar the better. Now retrain slash fine-tune this network on your own small dataset. The idea is that the network will already have learned enough useful correspondences and features from the original training data that that learning will transfer over and allow it to adapt to your own data comparatively quickly. This is tremendously popular in computer vision, for example. It's also already being used secretly in these experiments: since I only have 100-500 examples in each of my datasets, I've been starting from the stock neural network in textgenrnn, which has already been trained up on a diverse corpus and thus "knows about" many different linguistic patterns already, and then fine-tuning it on my own data. Let's try transfer learning here! Let's train the textgenrnn neural network on my list of silly words, and then fine-tune it on various other lists, but only for a few epochs, in the hopes of combining the features of the new and original datasets. This is hard to get right: if you don't train the network on the new data enough, it'll just spit out gibberish, but if you train it too much, it'll forget all about the older dataset altogether and only generate text like the new dataset. In my experiments with textgenrnn and training sets of 100-500 words, somewhere around 5 epochs generally seems to be about right. (To perform an epoch of training is to train a network on each example from your training set exactly once.)

First, some bonus examples of machine-generated silly words inspired by words like "apparatus" and "poltroon":

When I took my silly word network and tuned it on Star Trek episode titles, the poor network mostly just seemed confused. There were a few gems, though they don't necessarily seem very Star Trekky:

(That last clearly a Bashir/O'Brien story.) Doctor Who titles seemed to work somewhat better. After taking the silly word network, and tuning it up on Doctor Who episode titles, many examples are almost plausible:

However, others are gratuitously preposterous in their circumlocution: