You are an octopus who loves sushi. Every friday night, you visit the local sushi restaurant and chow down. Your favorite sushi is the Spider Roll and least favorite is Tako. But, you are the kind of octopus who really likes variety—a totally random mix of sushi.
Unfortunately, you go to a sushi restaurant with a conveyor belt of sushi, where each roll moves past you in succession. As each roll passes, you can decide whether to pick it up (and put down one if your arm is currently holding one) or ignore it and wait for the next. Your problem is to find some way of choosing a sample of sushi rolls uniformly, such that each sushi roll that passes you has an equal probability of ending up being in one of your arms at the end. Sadly, you don’t know how long the sushi belt will continue to run. What to do?
One thing you might think to do is as each sushi roll passes you, you flip a coin. If the coin is heads, you pick up the current roll with an arm, and throw out that arm’s sushi. If the coin is tails, you ignore the current roll. Done dumbly, this, of course, biases you against keeping the earlier sushi, since they have a greater chance of being replaced if they’re even picked up. So you figure you can change the coin weight as the sushi roll past, and maybe, with some luck, you can end up with a uniform sample. But, if you don’t know how many sushi rolls will go by, you might have trouble deciding how quickly the coin weight should change…
But there’s a better way. It’s called reservoir sampling. Here’s one way to think about it, before you think about it: suppose that we assigned a random number chosen iid, uniformly from [0,1], to each sushi roll on the conveyor belt—all the ones you have seen and will see.
Now suppose that to take a uniform random sample of 8 of them (one for each arm), you take the rolls with the 8 lowest random numbers. Each roll has an equal probability of having its number in the lowest 8, so your sample of the lowest 8 is uniform.
What have you gotten? Well, something very nice. By thinking of it in those terms, rather than coin-flipping-accept-reject terms, you can make a pretty online algorithm. Each time you see a new sushi roll, you draw a random number. You hold on to the 8 sushi rolls with the lowest numbers, and if the new roll is lower than the largest of your 8, you add it to your 8, and drop the largest. At the end of this process, you are holding in your 8 arms the 8 lowest samples from all of them. And you didn’t need to ever hold on to more than 8 rolls. Cool.
(If you aren’t really an octopus, maybe you’re doing NLP or psycholinguistics and trying to draw N random sentences containing the word “octopus” from a corpus, without holding all occurrences of “octopus” in memory.)
Of course, that algorithm I gave isn’t great. In particular, you have to look at each sushi roll. But of course you could draw random numbers for the next several rolls and not even look at them if they are too large. Amazingly, you can sample 8 rolls uniformly, from N total in less than time. Thank this guy. In fact, drawing n samples uniformly from N is doable in
time.
Pretty, clever algorithm.

