Nerdbucket Blog!
Leave Comments
Contact Me
Nerdy Links
Privacy Policy

Member Area:
Member FAQ
Log In
Register

Jump To Section:
Calculations
Explanation
Real-world use
 


Hypergeometric Distributions: An Explanation

I love math. I mean, I'm a nerd, so no big surprise. But I really love math. And statistics have been an area of particular interest to me this year (2007) for some strange reason. So I was messing around and learned about hypergeometric distribution. To put it in layman's terms, think of a jar (or urn) filled with marbles. Some are white, some black. If you draw some marbles out of the jar, the hypergeometric probability formula can tell you the odds of drawing a certain number of white marbles.

In practical terms, think of Keno - the total number of marbles would be 80, one for each number on the board. White marbles would be your 10 numbers picked, and the draw would be the 20 numbers the computer picks. In this situation, you can use hypergeometric distribution probabilities to easily figure out just how quickly the house is stealing your money! (Be warned, in most casinos it's pretty dang fast)

Hypergeometric distributions are also abundant in state lotteries, poker, and a variety of other games of chance.

Examples

For your convenience, here are a couple examples for setting up probability computations:

Typical 10-spot keno game - show
  • Total Marbles: 80
  • White Marbles (your 10 spots): 10
  • Draws (computer picks): 20
The odds will tell you your chances of each outcome, from getting 0 matches to all 10.

Hold 'em poker hand - getting trips on the flop when holding a pocket pair - show
  • Total Marbles (unknown cards left in the deck): 50
  • White Marbles (your 2 outs): 2
  • Draws (cards shown on the flop): 3
The odds will tell you what your chances are of getting 0, 1, or 2 of your 2 outs. In this case it helps to look at the odds of 0 and reverse them, because if you get 1 or 2, you've hit your hand.

State lottery - show
  • Total Marbles (numbers to choose from): 49
  • White Marbles (your 6 picks): 6
  • Draws (computer picks): 6
Nothing special here - just like keno, odds are shown for hitting 0 to 6 of your chosen numbers.

History

Why did I write this code? Did I just want to upstage the people who wrote the other one Wikipedia links to? Well, no. Actually I wrote my code a while before even realizing that Wikipedia had any external links.

It all started long ago (like 6 months ago) when I wanted to find the odds for building a state lottery. I built a little Ruby program that computed the odds for a given number range and a given number of picks. It was very exciting.

More recently, I was bragging about my awesome lottery odds calculator (nerds brag about these sorts of things, it's true), and a friend asked if I could get the odds for Keno (he was playing at pogo.com and thought the Keno game looked pretty easy to win real money). After a long while of researching, I finally discovered that Keno was a hypergeometric distribution problem. So I went to work to retrofit my old Ruby code to work with all distributions. When I was done, I noticed something interesting - the code would still work for a state lottery (try setting white marbles and draws to be the same number - i.e., you choose the same number of spots as the computer). Or poker odds: if there are X unknown cards, you have Y outs, and you're computing for a single draw, you'd put marbles as X, white marbles as Y, and draws would be 1.

So basically I was terribly impressed by the flexibility of this algorithm - it seems to exist in some way or another on so many different complicated games of chance. For me, this was very interesting.

That being said, once I noted Wikipedia's entry had an external link, I made a point to get my code in a public place, write these explanations (and the web- based calculation form), and add myself to the WP page. I may not be smart and cunning, but at least I eventually realized I could steal somebody else's thunder!

The move to C++

But why did I write a C++ version of this code? The Ruby code works extremely well for small sample sizes, it's a lot less code to write, it is sexier code, it is my favorite language, I haven't used C++ regularly for probably 4 years, and so on.

But I recently ran across what amounts to my potential dream job (game programming), and they wanted a code sample. Their stuff is of course primarily C++, and I needed to prove that I could jump back into C++ without too much trouble. So I spent some time reacquainting myself with C++ and implemented the Ruby class in C++.

The end.

Source code:

Testing has (obviously) revealed that the C++ code is much faster than the Ruby, but surprisingly the Ruby code ain't half bad. The Ruby code is obviously slower, but other than that the two are very similar. Both will overflow with huge numbers (such as my maximums on the calculation page). The Ruby code can hit a stack overflow as well, but only when using such huge numbers that you really ought to be writing your own big number class....

Anyway, here they are:

Ruby

C++




Amazon.com 100 Hot DVDs
"