Something like this is bound to happen

Posted by piantado on October 28, 2009

Hilarious news from California: the Governator apparently sent a veto message to the legislature in which the first word of each column spells “fuck you.” You can read about it here and here. His spokesman Aaron McLear is quoted as saying “My goodness. What a coincidence. I suppose when you do so many vetoes, something like this is bound to happen.”

I couldn’t resist actually computing the probability. You can pretty fairly assume that each word starting a line is chosen independently. If you take the (token) frequency with which each letter appears at the start of a word, you can find that the probability of actually spelling “fuckyou”: it’s about 10^{-12}. So yes, it is bound to happen, but only for one in 1 trillion (seven-line) vetos. As for the alternative hypothesis, odds are probably better than 1 in 1 trillion that either this is a hoax, someone is trying to get fired, or Schwarzenegger has a sense of humor.

It’s also easy to find the most likely words to spell in the left hand column. Here are the top four letter words with their corresponding log probabilities (of course, its not very likely you’ll get a word at all). There are so many “t”s because it is the most likely letter to start a word (p=0.15). It seems they could have picked some other insults if they really wanted the “coincidence” story to be a little more plausible:

twat 0.000235
watt 0.000235
tats 0.000227
that 0.000182
tact 0.000145
wait 0.000125
twit 0.000124
tits 0.00012
iota 0.000119
swat 9.90e-05
wast 9.90e-05
data 9.71e-05
asst 9.59e-05

Update:

Apparently there’s been some interest in this so I’ll give a little more detail of how to compute this. If you have a list of word frequencies, you can add up the total frequency of all words beginning with “a”, beginning with “b”, beginning with “c”, etc. When you divide each of these numbers by the sum of all your frequencies, you get the probability that a randomly chosen word in a text will begin with each letter. These probabilities are listed below:

a 0.153
b 0.0445
c 0.0411
d 0.0273
e 0.0207
f 0.0431
g 0.0167
h 0.0513
i 0.0809
j 0.0041
k 0.005
l 0.023
m 0.036
n 0.0225
o 0.063
p 0.032
q 0.0021
r 0.0209
s 0.0642
t 0.1521
u 0.0109
v 0.0056
w 0.0663
x 0.0001
y 0.0133
z 0.0001

Note that this is not simply how many words start with each letter—instead, it’s the total probability of all words starting with each letter.

Since to spell “fuckyou” you must first choose “f”, and then “u”, and then “c”, etc, you can compute the total probability of “fuckyou” by simply multiplying the probabilities for each letter (This assumes independence between the words which start each line, which is pretty reasonable). When you do this, you get a probability of 8.82 \cdot 10^{-13}.  Above I just reported the rougher, slightly conservative, order of magnitude estimate of magnitude estimate of 1 in 1 trillion.

As Ted pointed out, what you probably really want to compute is the probability of anything shocking–if Schwarzenegger had said “biteme” instead, it might have also made news headlines. So how surprised should we be to see anything insulting? Well to do this we could sum up the total probability of a big list of insults, but I don’t have one handy.