Technology & Training
PokerSnowie's Poker Knowledge
PokerSnowie is artificial intelligence-based software for No Limit Hold'em Poker. It has learned to play No Limit, from heads-up games to full ring games (10 players), and knows how to play from short stacks all the way up to very deep stacks (400 big blinds).
PokerSnowie is based on artificial neural networks. Those neural networks are a mathematical model based on biological neural networks, like those found in the human brain. While biology is far from explaining exactly how the neurons in the brain work and how learning takes place, some principles can be expressed as mathematical formulas, resulting in artificial neural networks.
The idea for artificial neural networks first came about in 1943 (Warren McCulloch and Walter Pitts) and a lot of applications nowadays use them very successfully. Using neural networks has become standard in many areas of the industry. However, the challenge was to design a learning algorithm for the game of Poker, which is a very complex multi-player game with hidden information. While for the specific variant of fixed limit heads-up the computer is known to be just as good as the best professional Poker players, nobody could create strong artificial intelligence-based software for the most popular (and most complex) variant: No Limit full ring game. This goal has now been achieved with the release of PokerSnowie.
PokerSnowie's initial training
To begin with, PokerSnowie played completely at random. After each hand played, successful betting lines were reinforced and unsuccessful moves learned from and reduced. For example, a call with a low high card hand on the river is mostly a losing situation, so PokerSnowie would call less and less with such hands, whereas trips win most of the time, so calling in this situation was reinforced.
Many people are surprised that computers can learn something 'psychological' like a bluff. In fact, this is one of the first things PokerSnowie learned. If a bluff is often successful in certain situations, bluffing is reinforced and PokerSnowie bluffs more often.
No expert knowledge
There is no expert knowledge built into PokerSnowie's strategy. This proved to be a disadvantage at the start of training: strong hands like a full house were randomly played and even folded. Playing quads or straight flushes was especially difficult to learn because it's very rare to hold those hands. Here, a big difference can be seen between human learning and PokerSnowie's learning: a human would know that quads is a very strong hand that wins almost every time. This leads to the obvious conclusion that quads should never be folded. PokerSnowie, however, sees a hand that it doesn't know and with which it has very little experience. Only slowly did it adapt its strategy with these hands in the right direction. Of course it's easy for the neural network to learn how to play these kinds of hands, but it takes time.
On the other hand, giving PokerSnowie the complete freedom to learn whatever it thinks is best has extraordinary advantages. If an expert were to stipulate parts of the strategy, those parts could not be improved by PokerSnowie, even if the expert's strategy proved to be wrong. The beauty of the expert-free approach is that PokerSnowie can become a much better poker player than the programmers and also better than humans in general!
PokerSnowie on the way to the ultimate balanced game
After the initial phase of training, PokerSnowie had learned the basics: folding bad hands, calling with good hands, raising as a bluff and for value. PokerSnowie's strategy already ranked alongside good Poker players. However, the strategy was still quite unbalanced. In some situations, for example, it would bluff way too much, which could easily be exploited by an attentive opponent, who could easily call with weaker hands or raise back and score a nice profit against PokerSnowie in the long run.
The next and most extensive, phase of the training was about getting the balance right. PokerSnowie constantly played against adapting agents that tried to exploit PokerSnowie's strategy as much as possible. If, for example, PokerSnowie bluffed too little, the agents would start calling less and therefore pay off PokerSnowie's good hands less. If PokerSnowie bluffed too much, the agents would start calling more and re-raise more aggressively. PokerSnowie tried to defend against those agents by constantly changing its hand ranges. This practice of adapting and learning is an on-going process which continually improves PokerSnowie's balance and robustness, making it difficult for agents to find exploitable leaks in PokerSnowie’s play.
PokerSnowie will periodically release updates to its core AI brain. These version updates incorporate new learning and experience gained from training on large computer clusters and using refined algorithms. The ultimate result of all this ongoing work is that over time PokerSnowie’s advice becomes stronger and more consistent across many different situations.
Detailed information about all major AI releases is available on the PokerSnowie Blog.
Next: Read more about PokerSnowie's weaknesses