Interview: NaNoGenMo and coherence
A few months ago I was interviewed by Wired about NaNoGenMo 2019, an online programming challenge where participants try to build a novel-generator in 30 days. The interview was part of the background research for this Wired article.
I figured some people might be interested in this interview too, so I decided to put the questions and their answers up on my blog. I've rewritten some answers of my answers to clarify them -- the reporter who interviewed me had read my NaNoGenMo paper, but perhaps you haven't. :)
Why the focus on coherence in your text generation research?
I focused on generating coherent texts, because that is an open problem within natural language generation (NLG) research.
It is easy to let a computer generate loads of gibberish, but quite hard to generate a story that looks "natural", as if it was written by a human. The longer the generated text, the harder it is to get a coherent whole. With current research, we can generate coherent sentences, and if we put in some effort, coherent paragraphs.
(Note that this was written before the release of GPT-2 and GPT-3. However, even these models don't solve the coherence problem completely: GPT-2 and GPT-3 work primarily for the English language, and their language models are hard to copy since you would need a bigger dataset and more computing power than most people have access to.)
Examples of things that can go wrong during text generation:
- the text generator loses track of the topic it's discussing
- the generator gets stuck in a loop and generates the same phrase over and over again
- the generators forgets to mention an important piece of information, confusing the reader All these things can completely derail the flow of text.
Some people solve this coherence problem by giving the text generator software a text plan: a structure, a storyline, or a checklist of information that should be mentioned. However, someone (or something) needs to write that text plan, so with a text structure you don't solve the problem, but you move it elsewhere.
I try to solve this problem differently. My preferred solution is creating a text generator that can make readers believe they are reading coherent texts -- even if the texts are not coherent at all! In other words, I'm more interested in perceived coherence than actual coherence.
I got the idea by looking at the text generators of NaNoGenMo 2018. We humans are pattern matchers, and we are biased towards finding meaningful patterns even in random data. Some NaNoGenMo participants used this to their advance by tricking the reader and playing with their expectations.
Of course, "tricking the reader" is only a good strategy when you're generating creative texts, or fiction, not for informative texts. My research is about generating texts for in video games (mostly RPGs). The goals for those kind of texts are fundamentally different from goals of newspaper articles or weather reports. Factualness is less important than entertainment value, aesthetics, and fit with the game narrative.
From your paper, it sounds as though the novel length and form really pushes the boundaries on coherence. I wonder if you found most people were aiming for coherence, or for other attributes?
Given the projects I've seen, I think coherence was not an important requirement to the majority of the participants. I think most people are just trying to have fun and create something cool in one month. ;)
Whether coherence is a goal depends on the output format chosen by the participants. The NaNoGenMo challenge is defined as 'create a text generator that can output a text of 50,000 words', so the output format can be anything. The novel format is not enforced by the organisers of the challenge in any way, which is part of the fun. Some participants end up generating 50,000 words of word art, poems or "lists". For word art, the visual aspect is more important than coherent. Poems and lists are microtexts, so they should be coherent at the word-level and paragraph-level, which is less challenging than generating one coherent 50,000 word text.
It also depends heavily on the objective of the person participating in NaNoGenMo: some people (like me) use it to learn about new techniques, others are trying to create procedural art. Some participants just like to be surprised by the output of their generator, and some people use text generation to circumvent the effort of writing 50,000 words by hand. These are all really different goals, and require different levels of coherence.
Do you have a sense of whether some of the newer methods, such as large language models, might make coherence easier to obtain? Or are other methods, such as hard-coding narrative structure, still the best bet?
Using a large language model can definitely save you some time when developing a generator, as basic properties of the language are already embedded in the model. This means the programmer of the generator can focus on writing a specification of the /type/ of output (what is a good dialogue, poem, story, ...), instead of also having to define of the language itself (what is proper grammar, what is a sentence, what are valid words, where do we put an exclamation mark).
However, large language models are not the end of all our problems. ;)
Two problems that I regularly run into in my own research: New methods generally work only with large amounts of data. For example, when we want to train a machine learning algorithm to do something for us, we need a large dataset to train it on.
- there are limited resources (datasets, opensource libraries) available for languages other than English, Spanish, Chinese, ...
- there are limited resources available for specific application domains. For example: if we want to generate text that should convey some kind of emotion or opinion, we need information about how this is expressed in text. However, most of the datasets available with this type of text contain product reviews. Product review texts do contain emotions and opinions, but a different dataset, such as one containing texts about emotional memories or personal experiences, might be more suitable for our goal. Ehud Reiter has written a good blog post about a similar problem: https://ehudreiter.com/2019/08/01/do-we-encourage-inappropriate-data-sets/
New NLG techniques are a big improvement, but even when you use them, coherence is still an open problem. From what I've seen, even new approaches, like those using deep learning, often display the same problems with coherence that I've mentioned above, especially in longer texts.
For coherence of fictitious texts: how you use the technologies is more important than which technologies you use. Humans WANT to make sense of the stuff they read. We are natural pattern-searchers. We can use that when we generate texts, especially novels and other works or fiction. If you can use people's goodwill towards a automatically generated text, towards a computer, that's more important than presenting factual statements in the right order (which is essential when you're generating informative, non-fiction text!). Think about the ELIZA-effect described by Douglas Hofstadter: people will attribute more intelligence and purpose to a computer program than is warranted by its code. If you present people with computer-generated gibberish and tell them the "author" was drunk when they wrote it, or that the author has gone mad, people start actively looking for meaning in the text, and don't mind so much the grammar mistakes or unfinished sentences.
Have you tried your hand at NaNoGenMo? If so, were you aiming for coherence, the semblance of it, or something else?
I'm planning to participate this year for the first time! Together with my colleague Lorenzo Gatti I'm going to do a science-fiction themed project, as next year I'll be working on generating text for a new science fiction game. NaNoGenMo is a perfect playground for trying out new techniques, so I'm going to play around with GPT-2. :)