“Superhuman” Go AIs still have trouble defending against these simple exploits
In the ancient Japanese game of Go, state-of-the-art artificial intelligence has generally been able to defeat the best human players since at least 2016. But in the last few years, researchers have discovered flaws in these top-level AI Go algorithms that give humans a fighting chance. By using unorthodox “cyclic” strategies—ones that even a beginning human player could detect and defeat—a crafty human can often exploit gaps in a top-level AI’s strategy and fool the algorithm into a loss.
Researchers at MIT and FAR AI wanted to see if they could improve this “worst case” performance in otherwise “superhuman” AI Go algorithms, testing a trio of methods to harden the top-level KataGo algorithm‘s defenses against adversarial attacks. The results show that creating truly robust, unexploitable AIs may be difficult, even in areas as tightly controlled as board games.
Three failed strategies
In the pre-print paper “Can Go AIs be adversarially robust?”, the researchers aim to create a Go AI that is truly “robust” against any and all attacks. That means an algorithm that can’t be fooled into “game-losing blunders that a human would not commit,” but also one that would require any competing AI algorithm to spend significant computing resources to defeat it. Ideally, a robust algorithm should also be able to overcome potential exploits by using additional computing resources when confronted with unfamiliar situations.