One of the first problems encountered in my game is how to create a series of random letters. Each player’s board has a series of random letters constantly being produced and although that doesn’t seem too difficult, I wanted it to actually generate meaningful letters that could be easily used.
If I went a strictly naive route I’d just make a simple random letter generator that selects a random letter between A and Z. In English, however, not every character has a 1 in 26 chance of appearing in words. Most people know that there are a lot more E’s than X’s for example. That’s why the X tile in Scrabble is worth more than the E or S tile. We can actually look to Scrabble for some insight on how to analyze letter distribution, but I found it to be hiding too many of the details.
Then it occurred to me that I actually have a very representative sample of English (the 172,000 word list). I can just analyze the text of the word list to find letter frequencies. Here’s an algorithm:
- Scan each word out of the word list file. Trim all whitespace and non-alpha characters.
- Build a dictionary with a char as a key and an integer as the value – this will hold the count of each kind of character.
- Iterate over each word and each character in the word to put it in the frequency table. Also count each letter to find the total number of letters in the word list.
- For each letter in the dictionary, find the percentage of use in the word list by dividing the frequency table count by the total letter count. This is the probability that the letter will occur in the word list.
That analysis is actually performed by the Content Processor I wrote to build my word DFA. The result is a probability table that is written to disk and then imported by the game’s runtime.
The table that is produced is simply a 1,000 character table with each character replicated the number of times to make the probability work out (For example, E occurs 11.5% of the time in my word list, therefore 115 characters of E are placed in the table). When you want a random letter, pick a random integer between 0 and 999. Then take that integer and use it as an index into the character probability table and the character that comes out is the random character.
The result of all this is that the proper characters to form lots of words are generated for the player and he isn’t left with a bunch of infrequently used characters like Q, X, Z, or J.