The Yak

Posted on
dicemagic go learning
The Yak /images/lightbulb.gif

My friend Andrew introduced me to an idiom I had never heard before, which i’m quite fond of. Yak Shaving looks a lot like this. I’m tired, and staying up with Samuel while Natalie sleeps, so I’ll share my most recent Yak Shaving story.

The problem

My side project Dice Magic has super-bad code coverage, some modules are around 90%, but most are in the 50s. Well, it should be easy to write a few unit tests and bump up those numbers. Maybe I’ll even run gofmt and fix all my lint warnings.

Step 1: Decide where to start

Dice Magic has become a rather complex beast. We’ve got 2 parsers, integrations with multiple chat systems and DialogFlow. A DAL, a caching layer, logging, a pubsub queue, on and on.

I should probably start with the most basic, core function in the app. It rolls dice, so let’s start with the Roll() function.

//Roll creates a random number that represents the roll of some dice
func Roll(numberOfDice int64, sides int64) (int64, error) {
    //error handing removed for brevity
    result := int64(0)
    for i := int64(0); i < numberOfDice; i++ {
        x, err := generateRandomInt(1, int64(sides))
        if err != nil {
            return 0, err
        }
        result += x
    }
    return result, nil
}

So, the nut of this function is another function call to generateRandomInt(). As you might expect, the results from that function are random. Not much to test here… Or is there?

func generateRandomInt(min int64, max int64) (int64, error) {
    size := max - min
    //error handing removed for brevity
    //rand.Int does not return the max value, add 1
    nBig, err := rand.Int(rand.Reader, big.NewInt(int64(size+1)))
    if err != nil {
        err = fmt.Errorf("Couldn't make a random number. Out of entropy?")
        return 0, err
    }
    n := nBig.Int64()
    return n + int64(min), nil
}

Step 2: Learn about statistical analysis

I could mock this out and move on, but I actually do want to know if the dice this whole house of cards is built on are fair. The problem is, I have no idea how to validate my assumptions here.

The Yak /images/scarymath.svg

A quick Google search led me to the concept of a Chi-squared test with some big phrases like “goodness of fit” and pictures that I did not comprehend.

After a rather deep Wikipedia hole, I decided this was, indeed what I wanted to do. And after some further Googling, and reading lots of sites that look like this I learned that a chi-squared test is real easy to use! R has an implementation, surely Go does too. Go does not

Step 3: Find chi-squared tests for Go

But Go DOES have lots of modules people have built. I found gonum, which implemented something called a chi-squared distribution, but no chi-squared test. Ok, back to the internet and more academic papers. Maybe I can implement this myself.

Step 4: Eureka!

After much reading I found this nugget on Wikipedia

Accordingly, since the cumulative distribution function (CDF) for the appropriate degrees of freedom (df) gives the probability of having obtained a value less extreme than this point, subtracting the CDF value from 1 gives the p-value.
- https://en.wikipedia.org/wiki/Chi-squared_distribution

And what’s even better, is after 2 days of this adventure, I knew that that meant!

I can use gonum to create the chi-squared distribution from my degrees of freedom, calculate the cumulative distribution based on the distance between my expected and observed results! Then all I have to do is subtract from 1 to get p

First I’ll read a whole bunch of results into buckets

    m := make(map[int64]int)
    for i := 0; i < numberOfLoops; i++ {
        x, _ := generateRandomInt(1, numberOfBuckets)
        m[x]++
    }

It’s pretty easy to calculate the “expected” result for each bucket, it’s just the average.

// iterate over buckets of random numbers 
expv := float64(int64(numberOfLoops) / numberOfBuckets)
for e := range m {
    obs = append(obs, float64(m[e]))
    exp = append(exp, expv)
}
c := stat.ChiSquare(obs, exp)
p := 1 - distuv.ChiSquared{K: float64(df), Src: nil}.CDF(c)
t.Logf("chi2=%v, degOfFreedom=%v, p=%v", c, df, p)

looks like the random number generator that feeds Roll() is perfectly fair.

--- PASS: Test_generateRandomInt (3.94s)
        roll_test.go:33: chi2=170.2808, df=199, p=0.9307575285063816

But what about Roll() itself? It loops over generateRandomInt() once for each die. We should get a cool curve of probabilities. It should be pretty easy to calculate the expected value of each dice roll. I play a lot of craps, I sort of know this stuff.

I do not know this stuff.

Step 5: Go to sleep, wake up, try again

The Yak /images/dice-probability.gif

Calculating probabilities for 2 n-sided dice is pretty simple. Just grid out all possibilities and the number of possibilities that equal your target number divided by the total number of possibilities is the probability.

But what about more complex dice throws?

I decided to start by creating a go routine that outputs all possible combinations of a dice roll to a channel so I could process them.

Except… How do you do that?

Step 6: Back to Wikipedia

The Yak /images/combinations.png

Initial google searches led me to a number of possible solutions. Some were yielding combinations and some were yielding permutations, and I didn’t know the difference.

As I was halfway through implementation, I realized that combinations weren’t working as I was expecting. Combinations are over one set, I have 2 sets.

Step 7: Learn about set theory

As an aside, I never really went to school much. Only 1.5 years of high school and some collage courses here and there. There are big gaps in my understanding of math and algorithms. Nevertheless, I love figuring it out, but I often have to start from square 1. Which looks like this:

The Yak /images/tabs.png

Some more Wikipedia digging led me to the concept of a Cartesian product. More specifically the n-ary Cartesian product. Once I could search for that, StackOverflow to the rescue!

‘Course, I’m here to learn go, so I’m going to implement it with channels and range

func diceProbability(t *testing.T, numberOfDice int64, sides int64, target int64) float64 {
    rollAmount := math.Pow(float64(sides), float64(numberOfDice))
    targetAmount := float64(0)
    var possibilities []int64
    for i := int64(1); i <= sides; i++ {
        possibilities = append(possibilities, i)
    }
    c := make(chan []int64)
    go GenerateProducts(c, possibilities, numberOfDice)
    for product := range c {
        if sumInt64(product...) == target {
            targetAmount++
        }
    }
    p := (targetAmount / rollAmount)
    return p
}

diceProbability() will start a goroutine GenerateProducts() that outputs each possible dice roll to channel c. Then I’ll range over that channel and increment a counter if the sum of the product is the target.

Finally, we can calculate the probability of an arbitrary number coming up on an arbitrary dice roll.

Step 8: Put it all together

So we can calculate dice probabilities, and we can calculate chi-squared tests, now we just need to put it together!

    --- PASS: TestRoll/8d4 (2.80s)
        roll_test.go:109: Rolling 8d4 100000 times
        roll_test.go:110: Bucket : Probability : Expected : Observed
        roll_test.go:111: ------------------------------------------
        roll_test.go:116:      8 :  0.0015259% :   1.5259 :        0
        roll_test.go:116:      9 :   0.012207% :   12.207 :       12
        roll_test.go:116:     10 :   0.054932% :   54.932 :       56
        roll_test.go:116:     11 :    0.18311% :   183.11 :      180
        roll_test.go:116:     12 :    0.49133% :   491.33 :      482
        roll_test.go:116:     13 :     1.1108% :   1110.8 :     1153
        roll_test.go:116:     14 :      2.179% :     2179 :     2194
        roll_test.go:116:     15 :      3.772% :     3772 :     3791
        roll_test.go:116:     16 :     5.8334% :   5833.4 :     5755
        roll_test.go:116:     17 :     8.1299% :   8129.9 :     8048
        roll_test.go:116:     18 :     10.266% :    10266 :    10426
        roll_test.go:116:     19 :     11.792% :    11792 :    11750
        roll_test.go:116:     20 :     12.347% :    12347 :    12238
        roll_test.go:116:     21 :     11.792% :    11792 :    11760
        roll_test.go:116:     22 :     10.266% :    10266 :    10363
        roll_test.go:116:     23 :     8.1299% :   8129.9 :     8103
        roll_test.go:116:     24 :     5.8334% :   5833.4 :     5947
        roll_test.go:116:     25 :      3.772% :     3772 :     3714
        roll_test.go:116:     26 :      2.179% :     2179 :     2202
        roll_test.go:116:     27 :     1.1108% :   1110.8 :     1128
        roll_test.go:116:     28 :    0.49133% :   491.33 :      477
        roll_test.go:116:     29 :    0.18311% :   183.11 :      164
        roll_test.go:116:     30 :   0.054932% :   54.932 :       38
        roll_test.go:116:     31 :   0.012207% :   12.207 :       19
        roll_test.go:121: chi2=25.17996518137826, df=23, p=0.34107352659248535
PASS

Step 9: Celebrate success

2 days and 10k dirty diapers later, I’ve imporved code coverage by 1%!

Full code available here, if you’re interested.