The game Wordle took the world by storm last year – you might have seen your friends posting green and yellow boxes on social media, claiming that they have solved this daily word puzzle in three guesses, or that dreaded “X/6,” which means that they didn’t manage to crack it. When one considers what first word to guess, it might be tempting to randomly put a fiveletter word at the beginning, but this can actually be reduced to a scientific question. It is not hard to see that some words would be a better first guess than others; for example, the word “FUZZY” would be far less ideal than “RAISE”, since the letters in the former occur far less often than the letters in the latter. What, then, is one’s best shot at cracking the puzzle? What Is Wordle? We are assuming readers know how Wordle functions. For those who do not, here is a quick crash course. Wordle’s database is made of 2,315 five-letter words picked by the creator of the game as solutions, and a pool of approximately 13,000 five-letter words that are valid guesses (which include the 2,315 words above, and many more words that are not commonly used) [1]. Each day, one word from the database is selected to be the answer to the puzzle. If your guess has a letter that is in the word and in the same position, the letter box shown will be green; if the guess has a letter that is in the word but not in the correct position, the letter box shown will be yellow; otherwise the box is gray. How to Define “Informative”? To give a satisfying answer, we first need to quantify what is meant by “more useful”. A “useful” guess gives us more information; but how do we quantify information? Luckily for us, this was done in the 1940s by Claude Shannon, the father of information theory. Shannon defined information by the following equation : , where p is the probability of the event happening. You may ask, why the logarithm function? Recall a property of the logarithm from high school: . If we have two independent events happening each with a probability of p1 and p2 respectively, then the probability of them occurring together is p1p2, or: So the multiplicity of probability is captured in the amount of information it gives. Information is typically measured in bits; in the case of Wordle, it basically means how many times a word can reduce the number of possible choices into halves. It is rather unlikely for “FUZZY” to be the first hit. Suppose it returns five gray squares – what information do these squares give us? Using the above two first guesses (“FUZZY” and “RAISE”) as an example, the probability of “F” occurring in an English word is approximately 2.2% (Table 1) [2], so the probability that it does not occur is 97.8% or 0.978. We can find out the probabilities for each letter in our guesses, and decide that the combined information “FUZZY” gives is 0.093 bits (footnote 1): What if “RAISE” turns out to have all five gray guesses? We have: Demystifying Wordle: A Crash Course in Information Theory Wordle大揭秘：資訊理論101 Table 1 Selected relative letter frequencies in English language (footnote 2) [2]. By Sonia Choy 蔡蒨珩 Letter Relative Frequency Letter Relative Frequency F 2.2% R 6.0% U 2.8% A 8.2% Z 0.074% I 7.0% Y 2.0% S 6.3% E 13%

RkJQdWJsaXNoZXIy NDk5Njg=