Spoonerism generator Python script

2018-11-08 (Thursday) | 500 words (~3 minutes reading)

Things I've been working on.

“What’s the Difference Between” is a childish word-play game where you give clues for other players to guess a spoonerism phrase pair. For example:

Q: “What’s the difference between stress on a group, and an old locomotive?”

A: One is team strain, the other’s a steam train.

I’m part of a WhatsApp group that sends these back and forth on a daily basis (usually with less innocent contents than that example).

Generating potential pairs seemed fairly automatable so I put together a hacky Python script to produce pairs that might make good WTDB answers (i.e. a spoonerism generator).


import itertools


def n_swaps(word_a: str, word_b: str, n: int) -> frozenset:
    """
    Return words with all combinations of swapping up to their first n letters.
    """
    if n <= 0:
        return frozenset()
    swaps = set()
    # Swap cartesian product of n letters in each word.
    # E.g. [(0, 1), (0, 2), (1, 0), (1, 1), (1, 2), (2, 0), (2, 1), (2, 2)]
    for swap_counts in itertools.product(range(n + 1), range(n + 1)):
        swap_a = swap_counts[0]
        swap_b = swap_counts[1]
        swaps.add(
            frozenset((
                word_a[:swap_a] + word_b[swap_b:],
                word_b[:swap_b] + word_a[swap_a:],
            ))
        )
        swaps.add(
            frozenset((
                word_b[:swap_a] + word_a[swap_b:],
                word_a[:swap_b] + word_b[swap_a:],
            ))
        )
    return frozenset(swaps)


def order_pair(words: tuple) -> tuple:
    """
    Ensure consistency ordering of pairs.
    """
    # Sort alphabetically first to ensure consistency when pair is same length.
    return tuple(sorted(sorted(words), key=len, reverse=True))


def render_pair(words: tuple) -> str:
    return '{} {} / {} {}'.format(
        words[0][0], words[0][1], words[1][0], words[1][1]
    )


class WordSet:

    words = set()

    def add(self, word: str):
        self.words.add(word)

    def find_swaps(self, word: str, n: int = 2):
        for partner in self.words:
            if partner == word:
                continue
            if partner[:1] == word[:1]:
                continue
            for swap in n_swaps(word, partner, n):
                if self.validate(*swap):
                    yield (order_pair(swap), order_pair((word, partner)))

    def validate(self, *potentials) -> bool:
        for potential in potentials:
            if potential not in self.words:
                return False
        return True


if __name__ == '__main__':
    word_set = WordSet()
    import sys
    for line in sys.stdin:
        word = line.strip().lower()
        if len(word) < 3:
            continue
        for pair in word_set.find_swaps(word):
            print(render_pair(pair))
        word_set.add(word)

https://github.com/hughgrigg/wtdb

You feed it a list of words on stdin, and it compares incoming words to all the previous ones to find potential swaps of 0-2 letters at the beginning of the words.

Ubuntu includes e.g. /usr/share/dict/british-english as a potential list of input words, but I find a more concise list like this one works better. It’s better still if you combine it with something like this and shuffle the order each time:

cat words.list | awk 'length($0) > 3'  | uniq -u | shuf | python3 wtdb.py

Interestingly, most of the pairs it comes up with are based on common prefixes and suffixes, e.g. reliable porter / liable reporter. The main thing it lacks is consideration of pronunciation – it works on spelling alone, which misses a lot of potential good pairs and also produces questionable ones that only work with the letters themselves and not the sound. It might be possible to address this with an IPA dictionary.

Tech mentioned

Python

NotesToSelf.Dev

Tech mentioned