Not even four top poker players could out-bluff Libratus.
Carnegie Mellon’s No-Limit Texas Hold 'em software made short work of four of the world’s best professional poker players in Pittsburgh at the grueling "Brains vs. Artificial Intelligence" poker tournament.
Poker now joins chess, Jeopardy, go, and many other games at which programs outplay people. But poker is different from all the others in one big way: players have to guess based on partial, or "imperfect" information.
“Chess and Go are games of perfect information," explains Libratus co-creator Noam Brown, a Ph.D. candidate at Carnegie Mellon. "All the information in the game is available for both sides to see. Poker is a game of imperfect information, since neither player can see their opponent’s cards,” he writes in an email to The Christian Science Monitor.
In other words, no matter how impressive your card skills, your power of prediction is limited by Texas Hold 'em because only certain cards are visible. An ace in your hand and an ace on the table might look like good news, unless your opponent is holding a pair of aces, or worse. No amount of computational power can answer that question.
“The typical approach to addressing perfect-information games [like chess] was to search through a game tree to find an optimal path,” he continues. But that’s no good in poker, because without knowing what cards are where, you can’t even figure out where on the tree you are.
Libratus is a whole new kind of machine.
Rather than try to sort through an unknowable tree, it focuses on finding a favorable move that represents a so-called “Nash equilibrium” solution, named for Nobel Laureate John Nash of “Beautiful Mind” fame.
The Nash equilibrium is often explained with the Prisoner’s Dilemma, which describes a situation pitting two criminal suspects against each other, incentivized to betray and punished for loyalty. Most people instinctively hit on the mathematically rational solution of betraying the co-conspirator and accepting a medium penalty, even though it isn’t the ideal solution.
The key is that even without knowing whether the co-conspirator will betray you or not, a rational strategy exists. You won’t always end up in the best possible situation, but you may be able to avoid the worst outcome.
Libratus plays poker in a similar way, never guaranteed to win any particular hand but likely to stay in the black over the long run.
The result was a poker tournament unlike any other. People tend to bet and bluff in certain increments, and noticing those patterns helps professionals find an edge, but the computer was hard to read.
The “biggest difference is that most human poker players do what Libratus does, betting with multiple bet sizes, but humans only have one or two usually. Libratus mixes in a bunch, maybe 10 or 15, and even mixes in situations that I thought didn't make sense before this competition started. It would be way too complicated for a human to do this sort of thing correctly,” Dong Kim, the highest performing of the four human players said in a Reddit interview.
As Libratus was analyzing the day’s games each night, the pros did the same. While they weren’t able to find a consistent winning strategy, they suggest the experience made them better poker players. “Once you face Libratus, there's nothing worse any human could ever do to you. Every human is going to seem like a walk in the park,” said Jason Les, another one of the players.
At the end of 120,000 poker hands played over almost three weeks, Liberatus was up 1.77 million poker chips, and all human players were in the red, according to host Rivers Casino. Fortunately for the participants, they won’t have to hand over any money. Instead, they’ll split a $200,000 performance-based reward designed to elicit their best play.
“It was an absolute beatdown. The human team lost at a rate of approximately 2.5 times what the program lost at before,” at the previous tournament, said professional player Doug Polk, reported Forbes.
The size of the win was surprising, even to the program’s creators, Tuomas Sandholm and Mr. Brown. “I think everyone expected computers to eventually outdo humans in poker, but the speed at which it happened was definitely surprising. Just 20 months earlier our precursor bot Claudico lost by a fairly wide margin. This time around, we won by a far wider margin. Nobody expected this, not even us. The international betting sites had us as 4:1 underdogs going into the competition,” Brown says.
The imperfect information nature of poker makes the win a huge achievement for the AI community, with far-reaching real world applications that Brown says include negotiations, auctions, and security interactions, to name a few. “In truth, most real-world scenarios involve hidden information. In the real world, not all the information is laid out neatly for all sides to see like pieces on a chessboard. There is uncertainty and deception,” he explains.
One remaining enclave of human superiority is multiplayer situations, however. Liberatus prefers bilateral dealing to group negotiation. It took on each poker player one-at-a-time, and couldn't have won in an ensemble game. Still, two-sided negotiations are common in the real world, and Libratus lays the foundations for computer programs that help negotiators elevate the art of the deal to a science.
Brown continues, “None of the algorithms in Libratus are specific to poker. We did not program it to play poker. We programmed it to learn any imperfect-information game, and fed it the rules of No-Limit Texas Hold'em as a way to evaluate its performance.”
With AI quickly learning to navigate both go’s practically infinite possibility tree, and poker’s fog of war, even the experts are running out of milestones to topple. Writing to the Monitor from a workshop called “What’s next for AI in games,” Brown had this to say about the future of artificial intelligence: “I think many people in AI are asking what comes next, and the answer isn't clear. But either way it's clear that AI will continue to make major advances in the coming years.”