





Hypergeometric Distributions: An Explanation
I love math. I mean, I'm a nerd, so no big surprise. But I really
love math. And statistics have been an area of particular interest to me
this year (2007) for some strange reason. So I was messing around and
learned about
hypergeometric distribution.
To put it in layman's terms, think of a jar (or
urn) filled with
marbles. Some are white, some black. If you draw some marbles out of the
jar, the hypergeometric probability formula can tell you the odds of drawing
a certain number of white marbles.
In practical terms, think of Keno  the total number of marbles would be 80,
one for each number on the board. White marbles would be your 10 numbers
picked, and the draw would be the 20 numbers the computer picks. In this
situation, you can use hypergeometric distribution probabilities to easily
figure out just how quickly the house is stealing your money! (Be warned, in
most casinos it's pretty dang fast)
Hypergeometric distributions are also abundant in state lotteries, poker, and
a variety of other games of chance.
Examples
For your convenience, here are a couple examples for setting up probability
computations:
Typical 10spot keno game  show
 Total Marbles: 80
 White Marbles (your 10 spots): 10
 Draws (computer picks): 20
The odds will tell you your chances of each outcome, from getting 0
matches to all 10.
Hold 'em poker hand  getting trips on the flop when holding a pocket
pair
 show
 Total Marbles (unknown cards left in the deck): 50
 White Marbles (your 2 outs): 2
 Draws (cards shown on the flop): 3
The odds will tell you what your chances are of getting 0, 1, or 2 of your
2 outs. In this case it helps to look at the odds of 0 and reverse them,
because if you get 1 or 2, you've hit your hand.
State lottery  show
 Total Marbles (numbers to choose from): 49
 White Marbles (your 6 picks): 6
 Draws (computer picks): 6
Nothing special here  just like keno, odds are shown for hitting 0 to 6
of your chosen numbers.
History
Why did I write this code? Did I just want to upstage the people who wrote the other one Wikipedia
links to? Well, no. Actually I wrote my code a while before even realizing
that Wikipedia had any external links.
It all started long ago (like 6 months ago) when I wanted to find the odds for
building a state lottery. I built a little Ruby program that computed the
odds for a given number range and a given number of picks. It was very
exciting.
More recently, I was bragging about my awesome lottery odds calculator (nerds
brag about these sorts of things, it's true), and a friend asked if I could
get the odds for Keno (he was playing at pogo.com and thought the Keno game
looked pretty easy to win real money). After a long while of researching, I
finally discovered that Keno was a hypergeometric distribution problem. So I
went to work to retrofit my old Ruby code to work with all distributions.
When I was done, I noticed something interesting  the code would still work
for a state lottery (try setting white marbles and draws to be the same number
 i.e., you choose the same number of spots as the computer). Or poker odds:
if there are X unknown cards, you have Y outs, and you're computing for a
single draw, you'd put marbles as X, white marbles as Y, and draws would be 1.
So basically I was terribly impressed by the flexibility of this algorithm 
it seems to exist in some way or another on so many different complicated
games of chance. For me, this was very interesting.
That being said, once I noted Wikipedia's entry had an external link, I made a
point to get my code in a public place, write these explanations (and the web
based calculation form), and add myself to the WP page. I may not be smart
and cunning, but at least I eventually realized I could steal somebody else's
thunder!
The move to C++
But why did I write a C++ version of this code? The Ruby code works
extremely well for small sample sizes, it's a lot less code to write,
it is sexier code, it is my favorite language, I haven't used C++
regularly for probably 4 years, and so on.
But I recently ran across what amounts to my potential dream job (game
programming), and they wanted a code sample. Their stuff is of course
primarily C++, and I needed to prove that I could jump back into C++ without
too much trouble. So I spent some time reacquainting myself with C++ and
implemented the Ruby class in C++.
The end.
Source code:
Testing has (obviously) revealed that the C++ code is much faster than the Ruby, but
surprisingly the Ruby code ain't half bad. The Ruby code is obviously slower,
but other than that the two are very similar. Both will overflow with huge numbers
(such as my maximums on the calculation page). The Ruby code can hit a stack
overflow as well, but only when using such huge numbers that you really ought to
be writing your own big number class....
Anyway, here they are:
Ruby
C++
Amazon.com 100 Hot DVDs 




