How the Hearts AI was broken on iOS

Friday, August 23, 2024

A few months ago, I received an email from someone telling me that there was a bug with the AI of Hearts. The user reported that the Queen of Spades ♠Q was always played by the AI the first time spades ♠ was played, regardless of whether it was a good idea or not. Whenever I get a report like this, I always verify to see whether I can reproduce the bug. That's actually rule number 1 when you're debugging: you need to be able to reproduce the bug, otherwise it's simply impossible to debug it.

However, when I played a few games of Hearts, I never noticed that the AI played ♠Q on the first trick when it actually shouldn't. No matter how many games I played, I could not reproduce it, so I had to tell the user that unfortunately I was not able to find a problem with the AI and that hence I couldn't do anything about it.

A few days ago however, I received yet another email from the same user that the problem was still there. I played a few games again, but this time I was indeed able to reproduce the bug. The specific case I found is illustrated below. In this situation, the AI would play ♠Q, while it's clearly not a good idea because Monica can probably play a lower card, and hence the AI gets hit with the ♠Q's 13 penalty points. The right thing to do here would be to play ♠10, or perhaps ♠A.

Now that I finally had a case that reproduced the bug, I could actually start debugging. Beware: the rest of this blog post is a bit technical, so if that's not you're thing, you can just stop reading here and remember that the bug is now fixed.

Still here? Cool! If you have read the blog post about how the AI works, you know that there are a lot of test cases for the AI, so I grabbed the game log from the situation above and created a test case for it, which looks like

it('does not play ♠Q when not needed', function() {
  this.log = `
    # 0
    1: ♥AQ4 ♦K7 ♣Q963 ♠AQ86
    2: ♥873 ♦Q852 ♣K104 ♠1093
    3: ♥K102 ♦A9643 ♣87 ♠K54
    0: ♥J965 ♦J10 ♣AJ52 ♠J72
    0: [♣2 ♣5 ♦J]
    1: [♣Q ♠Q ♠A]
    2: [♣K ♣10 ♣4]
    3: [♣8 ♣7 ♠K]
    # 1: ♥AQ4 ♦KJ7 ♣96532 ♠86
    # 2: ♥873 ♦Q852 ♣Q ♠AQ1093
    # 3: ♥K102 ♦A9643 ♣K104 ♠54
    # 0: ♥J965 ♦10 ♣AJ87 ♠KJ72
    1: ♣2 ♣Q ♣K ♣A
    0: ♠J ♠8
  `;
  let card = this.decide();
  expect(card).to.not.equal('♠Q');
});

As I knew the AI would play ♠Q in this case, the expectation was that the test would fail, but to my big surprise it passed, and the AI was correctly playing ♠10! This is every programmers nightmare: bugs that appear "in the wild", but are impossible to recreate under controlled circumstances. Once stuff like this happens, you know that you're in for a good debugging session. And boy, was it a good one!

The test case above was found on my iPhone, so that was my first reflex: let's actually run the test on an iPhone instead of on my computer, and indeed, there it was: the test failed on iPhone, but passed on my computer. That was a big step forward because I could now reliably reproduce the bug, but the downside is that debugging on an iPhone is an absolute nightmare because you don't have access to the debugging tools you would use on your computer. If you're familiar with JavaScript: this meant that I had to debug on my phone using alert(). Auwch!

After diving into the code for a while, I found that at some point a NaN value was introduced in the AI, but only on iPhone. NaN stands for Not a Number, and it is a value that you get in JavaScript when you divide by zero, as well as in a few other cases. The problem with NaN is that any numerical operation with NaN results in NaN as well, and comparisons like value > NaN are meaningless because they are always false. In less thechnical terms: once NaN appears somewhere in the AI, the behavior of the AI becomes undefined and impredictable, so that was what was happening.

After some further investigation, I found that the NaNs were introduced by a function that calculates binomial coefficients. As you might remember from your math classes, binomial coefficients are calculated using factorials as

\binom{n}{k} = \frac{n!}{(n-k)!\ k!}

Binomial coefficients are used in a lot of places in the AI, but as calculating the factorials can be time-consuming, the most commonly used factorials were pre-calculated so that we could just read them from a cache. More specifically, we hardly ever need a factorial larger than $13!$ in the AI because there are only 13 cards in a suit, so it makes sense to cache all factorials up to $13!$ .

The values of these cached factorials were stored in a Uint32Array, which is a JavaScript data structure that stores the values as unsigned 32-bit integers, and is created as

const cache = new Uint32Array(13);
cache[0] = 1;
for (let i = 1; i <= 12; i++) {
  cache[i] = cache[i-1]*i;
}

Note however that the cache only contains values up to $12!$ , because $13! = 6\ 227\ 020\ 800$ is bigger than the largest unsigned 32-bit integer $2^{32}-1 = 2\ 147\ 483\ 647$ . The function that actually calculates the factorials then looked like this

function factorial(x) {
  if (x < 0) return 1;
  return cache[x] || x*factorial(x-1);
}

What happens here is that we read the value from the cache, unless it doesn't exist, in which case we rely on the recursive approach.

However, modern JavaScript engines perform a lot of optimizations, and that's actually why we use a Uint32Array for the cache, because that way the JavaScript engines can internally store the numbers as 32-bit integers and speed up the calculations, even though JavaScript numbers are actually implemented in double-precision 64-bit binary format IEEE 754, meaning they can hold values larger than $2^{32}-1$ .

So what happens on iOS - at least that's my guess - is that once the factorial() function is called a few times, the JavaScript engine will optimize it. However, this optimization happens without it ever seeing values larger than $2^{32}-1$ and so it optimizes it, thinking that it will only ever see values that can be represented as 32-bit integers. However, once we call the factorial() function with $x = 13$ , it incorrectly assumes that the output will still be a number that can be represented using 32 bits, which is actually not the case! As a result, it will incorrectly return 0, which means that further down the road we divide by zero in the binomial coefficients, and - boom - that's where our NaN comes from!

What's interesting is that the other JavaScript engines - such as Chrome's V8 engine - do correctly handle this edge case, and as a result the bug cannot be reproduced. It seems to be like an actual bug in Safari on iOS! What's also interesting is that as long as Safari does not optimize the factorial() function, it produces the expected results. For example,

function factorial(x) {
  if (x < 0) return 1;
  return cache[x] || factorial(x-1)*x;
}
const cache = new Uint32Array(13);
cache[0] = 1;
for (let i = 1; i <= 12; i++) {
  cache[i] = cache[i-1]*i;
}

// Value is 6 227 020 800 as expected
let value = factorial(13);

produces the expected results, but if we "warm up" the factorial function with only values below $12!$

// Warm up the factorial function
for (let i = 1; i <= 12; i++) {
  for (let j = 0; j < 1e5; j++) {
    factorial(i);
  }
}
let value = factorial(13);

then the factorial() function gets optimized by the JavaScript engine and calculates $13! = 0$ , which is incorrect!

Even more confusing is that when I rewrote the factorial() function as

function factorial(x) {
  // Notice the <= instead of <
  if (x <= 0) return 1;
  return cache[x] || factorial(x-1)*x;
}

then everything works as expected! This is possibly because that way the JavaScript engine does not optimize it, which only shows that the way JavaScript engines optimize code is complex, and in the case of Safari even bugged!

I also tried to reproduce the bug by creating a web page that does nothing else than calculating the factorial of 13, but here I was unable to reproduce the issue. I guess this is because there's not enough JavaScript being executed for the engine to make it worth optimizing it. However, as the Whisthub AI logic is written in JavaScript, there's a lot of JavaScript being executed already, so I'm guessing the opitimization kicks in here because there's more to be gained!

Anyway, I ended up solving the bug by reworking how the binomial coefficients are calculated. Instead of using factorials for the binomial coefficients, there is actually a way that avoids the factorials and their corresponding large values. The implementation of this is left as an exercise to the reader.

So, the bug should no longer be present and on top of that I now also have an explanation for why I was initially unable to reproduce the bug: when I played against the AI on my computer using Google Chrome, then everything worked perfectly fine. It was only once I started playing against the AI on iOS that the bug became apparent! It was pure luck that I tried to reproduce it on my iPhone the second time and hence I did encounter the bug, otherwise I would never have found this.

Phew!