One of my first experiments with natural language generation was writing
random-tony, a program that could produce new chocolate flavours. The program uses a grammar as a model of a “valid” chocolate flavour. Here are some examples:
$ python ./generate_flavours.py blonde chocolate with marshmallows and cucumber dark chocolate with orange white chocolate with liquorice, rose petals and kiwi crumble blonde chocolate with cardamom, soy beans and broth dark milk chocolate with cola, mayonnaise and butterscotch white chocolate with sesame seeds, pumpkin and black pepper dark chocolate with jasmine and honey extra dark chocolate with apricot and broth dark milk chocolate with caramel and pear crumble extra dark chocolate with cucumber
When we generate text based on a context-free grammar, the grammar acts as a model for valid language production. So before I could automatically generate text output, I needed a program that can parse and interpret a context-free grammar.
The grammar parser can parse any context-free grammar; the generator takes a grammar and a start symbol, and produces language by applying grammar rules to the start symbol until there are no further rewrites possible. If you want to learn more about grammars and rewriting, check out the Tracery tutorial.
As input for the parser/generator, I wrote a context-free grammar to describe chocolate flavours. I was inspired by the crazy limited edition flavours that Tony Chocolonely (a famous brand of fair-trade chocolate in the Netherlands) releases each year. The popularity of Tony’s crazy flavours has caught on, and now almost every supermarket in the Netherlands sells chocolate with weird combinations of flavours.
The grammar defines valid chocolate flavours. I have defined it as a type of chocolate (dark, milk or white) and one, two or three special ingredients. A chocolate bar without any ingredients would also be a valid flavour in the real world, but since this would be a bit too boring, so I left these out of my definition.
LIMITED_EDITION -> CHOCOLATE INGREDIENTLIST INGREDIENTLIST -> INGREDIENT | INGREDIENT INGREDIENT | INGREDIENT INGREDIENT INGREDIENT
I looked at the flavours made by Tony’s Chocolonely and decided that an ingredient can be a number of things, such as fruits, nuts, vegetables and an assortment of other random weird things.
INGREDIENT -> FRUIT | FRUIT CRUMBLE | NUTS | DRINK | CANDY | SPICE | VEGETABLE | MISC
The fun part is filling in concrete examples of all the categories of ingredients specified above. For example, here is the list of ‘MISC’ ingredients.
MISC -> honey | coffeecrunch | butterscotch | wasabi | red curry | green curry | jalapeno peppers | balsamic vinegar | lavender | rose petals | cornflakes | mayonnaise | ketchup | soy sauce | broth | whipped cream | liquorice | salmon
So, why bother with writing a context-free grammar for NLG? Mostly, it’s fun. The random, combinatorial approach to generation can lead to weird, crazy and unexpected outputs. A similar approach is often used for procedural content generation for games. If you want some examples, take a look at the procedural histories of Dwarf Fortress, the loot in Diablo, and the NPCs of Caves of Qud.
As an unexpected bonus, random-tony has helped me land my current job. I developed it before I started my PhD in natural language generation, just for fun, but it helped me show my prospective supervisors that I was really interested in the main topic of research. This meant I had a headstart in the application procedure. Now, two years later, this little generator is helping me with my research too! Since my PhD project is about personalized (adaptive) natural language generation, I could probably use a modified version of the program to conduct experiments.
Judith van Stegeren is a Dutch computer scientist. She is working as PhD candidate at the University of Twente, where she researches natural language generation for the video games industry. She occassionaly works as a consultant in data engineering for textual data.