Spreading out words in space

A common technique for memorizing numbers is to associate numbers with words. The Major mnemonic system does this by associating consonant sounds with each digit. You form words by inserting vowels as you please.

There are many possible encodings of numbers, but sometimes you want to pick a canonical word for each number, what’s commonly called a peg. Choosing pegs for the numbers 1 through 10, or even 1 through 100, is not difficult. Choosing pegs for a larger set of numbers becomes difficult for a couple reasons. First, it’s hard to think of words to fit some three-digit numbers. Second, you want your pegs to be dissimilar in order to avoid confusion.

Say for example you’ve chosen “syrup” for 049 and you need a peg for 350. You could use “molasses,” but that’s conceptually similar to “syrup.” If you use “Miles Davis” for 350 then there’s no confusion [1].

You could quantify how similar words are using cosine similarity between word embeddings. A vector embedding associates a high-dimensional vector with each word in such a way that the geometry corresponds roughly with meaning. The famous example is that you might have, at least approximately,

queen = king − man + woman.

This gives you a way to define angles between words that ideally corresponds to conceptual similarity. Similar words would have a small angle between their vectors, while dissimilar words would have larger angles.

If you wanted to write a program to discover pegs for you, say using some corpus like ARPABet, you could have it choose alternatives that spread the words out conceptually. It’s debatable how practical this is, but it’s interesting nonetheless.

The angles you get would depend on the embedding you use. Here I’ll use the gensim code I used earlier in this post.

The angle between “syrup” and “molasses” is 69° but the angle between “syrup” and “miles” is 84°. The former is larger than I would have expected, but still significantly smaller than the latter. If you were using cosine similarity to suggest mnemonic pegs, hopefully the results would be directionally useful, choosing alternatives that minimize conceptual overlap.

As I said earlier, it’s debatable how useful this is. Mnemonics are very personal. A musician might be fine with using “trumpet” for 143 and “flugelhorn” for 857 because in his mind they’re completely different instruments, but someone else might think they’re too similar. And you might not want to use “Miles Davis” and “trumpet” as separate pegs, even though software will tell you that “miles” and “trumpet” are nearly orthogonal.

[1] Here we’re following the convention that only the first three consonants in a word count. This makes it easier to think of pegs.

Related posts