Fake Cannabis Strain Names Generated by a Neural Network

These days, thanks to The Wonders of Science[TM], we can train neural networks to imitate different styles of text by showing them some examples. Often the results are gibberish, but occasionally in this gibberish there is a nugget of... less gibberish. There are many fine Python libraries out there to let one run RNN experiments: I am using textgenrnn, and fine-tuning its stock model on data of my own whimsical fancy. Here is a selection of the most interesting, perplexing, or otherwise notable outputs.

In honour of no occasion in particular and for no special reason I scraped a list of cannabis strains to feed to my neural network. Amateur genetics enthusiasts give their hybrids some very whimsical names, and I thought they might pair well with my list of silly words to train a neural network to generate some really primo, high-quality nonsense. There were a number of confounding factors to consider. Firstly, the list of silly words was only ~500 items long, while the cannabis strains numbered over 10,000. If I tried combining the datasets to train a network, the cannabis strains would so outnumber silly words that the network would only learn features of the cannabis strains, and nothing about the silly words. So from the outset I knew I had to employ transfer learning: train a network up on silly words first to prime it, and then give it a little fine-tuning--just a single epoch, given the size of this dataset--to see if it could learn to generate plausible cannabis strain names infused with the spirit of codswallop and poppycock. But secondly, there were some orthographic differences between the lists. The cannabis strains were often capitalized, and included a lot of eccentric punctuation. The silly words however were all lowercased and included many fewer digits and special characters. Remember, these neural networks work on a character-by-character level, based on the characters generated so far: so once they generate a character that only occurs in one of the training sets, they're liable to finish off with more characters based just on that training set, since they're the only things they know can follow that. That means the network won't learn how to combine features from both sets, which is what I really want.

This happens a lot in machine learning research: whenever you want to combine data from multiple sources, you will always find that there are some eccentricities of the datasets that make them look very different, even if morally speaking they shouldn't be. Presence or absence of capitalization and punctuation in text, for example. Or average brightness or dimensions of a set of images. Spurious features that distract your machines, for machines are notoriously naive and flighty. To get data into a useful form for machine learning, one must clean and process it using your living wits. In this case, to get the two datasets onto a common character set, I converted them all to lowercase and stripped out all punctuation and special characters except whitespace before training the network first on the list of silly words, and then on the list of cannabis strains.

Let's see what we get!

bisscoke wonder
black missionfucker
bond cheese
breath rocket
cake star
christ
cockant cheese
dirty black jones haze
double orange wreck doctiva
drung royal
dubb quakoth
evils glue
fast platinum
febrile cream
glemondawg
god dawg
gorilla heaven
grapefremin moonshiney
haze haze
land shit
larry kush
layle milk
lemon shit haze
mary wizard
mons og
old wilds
pant flower
panto for lemon urkel
police cheese
puppys face
rip cream
shark police
shockiness
slut auto
spack cinder
stain porkle
super suck jones
superbeerit
swation and cheese
sweet glant
swiss risky
thank chocolates cheese
the thorn widow
the yerb
tonic loose
tooth cafe kush
toothelog
turtz bloor
ultra bud
wary hammer
y2fat irenus buddha

Along the way, since I was training the network on them already, I generated some bonus silly words:

accowbanger
blabberrape
blackle
booble
bostonsick
budderscartious
conkfuckummule
cowcocker
dickous
fartic
flibbergin
foolbus
grockety
hoodlick
knickillation
maily
panagogoon
pongeryfugger
poppycocquent
scock
shampy
shiffins
shimp
shouldery
shrumpiss
smeglance
smellgut
turdling

For kicks, I took the strains network, and fine-tuned it on the list of Doctor Who episode titles too:

fuck of the death
the car planet of the fire
the cards of the daleks
the chest of the world
the chill of the river
the chore screen
the content of time
the crimes of the daleks
the dead face
the end of the experiment
the end of the logos
the end of the night
the end of the war of the daleks
the family of monster
the final planet
the good war
the river law
the school shoot
the season of the daleks
the seeds of the daleks
the sensor of the daleks
the sentence of the dead
the stage of the daleks
the store of the daleks
the war song to the world
the wind of lavas
the witch of the daleks
time tat

Then I infused the Star Trek titles as well:

a sinspace
and the show of the change
boobally
boot of paradise
cigary
crude in the story
has computer
i have tood
imaginary eye
resent within
shore of the war
the bashing goodbye
the battle of the goodbye
the beholder of read
the bread
the card of the child
the care of the counter
the care of the morningly days
the career
the catmute of perspect
the chance of the muse
the changeling in the command
the darkness of paradise
the empire of the bottles
the factor of the world is living time
the falsing worlds in the blood
the forgotten element
the man of the counter
the man of the man
the man of the morningly part ii
the office front part ii
the paradise star
the return of the changeling
the scenario part ii
the search for man
the search for the behind
the search for the search
the search friend
the selfist
the shadow of the beholder
the starship of the forgotten
the way of the server
vondage

FUN WITH RECURRENT NEURAL NETWORKS:

Fake cannabis strain names (and silly words, and Doctor Who and Star Trek titles)