Iām a Chess Champion. Hereās Why I Play Chess Against ChatGPT
Iām a Chess Champion. Hereās Why I Play Chess Against ChatGPT
Jennifer ShahadeTue, April 14, 2026 at 10:09 AM UTC
0
āakinbostanciāGetty Images
Large language models (LLMs) are bad at chess.
And yet, as a three-time National Chess Champion and a two-time U.S. Womenās Chess Champion, I love to play against them. Not because they push me to play my best, but because of what they reveal about human nature.
Playing chess with LLMs has taught me how uniquely creative and diverse human beings are, how susceptible humans are to flattery and sycophancy, and how AI is beginning to shape human behavior.
LLMs are not meant to play chess well at all. After all, they are designed to predict whatās most likely to come next and to flatter us. AI-powered chess algorithms arenāt trying to crush you; they are trying to keep you playing. But in their interestingly bad chess play, we can learn lessons beyond the table or the token.
Superhuman chess AI programs, from the one that beat Garry Kasparov 30 years ago to DeepMindās āAlphaZeroā, can consistently beat any human player. But most humans donāt play the top chess computers anymore because your defeat is a foregone conclusion. Getting destroyed again and again can only teach you so much. Experimenting with LLMs, on the other hand, can be exhilarating.
When I first challenged ChatGPT4 to a chess game, it played decently, but I still got a great position after 15 moves and won a knight. Just as my advantage mounted, it hallucinated a phantom piece to recapture my queen. In other words, it cheated! At first, this didnāt make much sense. Arenāt off-the-rack LLMs more known for sycophancy than for stealing?
So I started to play the worst moves I could think of against ChatGPT. It bent the rules yet again, but this time in my favor. Phantom pieces replaced the pieces I had blundered. Whether I played better or worse than ChatGPT, it ended up making me the same level as it was. It wasnāt always cheating, but it was always confabulating. When humans confabulate, we try to fill in the gaps of our memories or dreams with the most logical sequence. ChatGPT was doing the same thing.
I have found that LLM hallucinations are more likely to occur when trying to execute ālong moves,ā which cross the entire board. This mirrors how LLMās struggle with long conversations.
Advertisement
When Google hosted the top LLMs in a tournament, 42 out of 47 games were from the Sicilian Defense, also favored by Bobby Fischer, and the fictional Beth Harmon from the Queenās Gambit. Why so much Sicilian love? Because itās the most popular opening. Recent DeepMind research showed the same effect when they attempted to create creative, aesthetically pleasing, and counterintuitive chess positions. The researchers found that AI often "collapses" on itself, repeating the same sorts of themes and patterns that they deemed ābeautiful.ā
In the case of DeepMindās chess beauty program, researchers were able to reduce this by explicitly programming for more diversity. But even with vast training data, probabilistic output, and diversity filters, itās not easy to mimic the variation and range of human thought.
To be sure, LLMs and AI more broadly are not the only technology to struggle to capture the diversity of the human experience. Take the algorithmic, winner-take-all dynamics of social media, in which conforming to what the average user wants gets you more clicks, attention, and money. To avoid falling into the pull of a mono-voice and monoculture, we must seek out diversity in our sources, prompts, and input. As Haruki Murakami wrote: āIf you only read the books that everyone else is reading, you can only think what everyone else is thinking.ā
Like chess engines, LLMs will only get better, and we have to prepare for that future. Chess has been wrestling with trying to keep the game fair despite superhuman AI for decades. Electronic devices have long been prohibited in chess competitions, but that has not stopped cheating from disrupting the field.
In perhaps the most prominent chess cheating scandal ever, the top-ranked Magnus Carlsen lost to then-19-year-old Grandmaster Hans Niemann in 2022. Carlsen dropped out of the tournament, and it was revealed that Hans had cheated in past online games. Though there was never any evidence to suggest Hans cheated against Magnus, outlandish theories went viral, such as one that suggested anal beads were used to intercept moves via an AI. Since then, live event broadcasts have added time delays and increased surveillance. Despite these measures, cheating accusations and scandals are still common. Some are valid. Others are thin on evidence, boosted by drama-thirsty social media algorithmsāand heightened because of fears of AI-based cheating.
What this teaches us is that building fancier cheat detection tools will be insufficient in the AI-driven future. Instead, we need to build trust and integrity throughout our communities. This is something that AI can not do for us.
It also teaches us that we cannot be naive about the complexities of our AI-driven future. Instead, we need to find positive ways to leverage AI.
Chess players have become experts at calibrating our use of AI for training and preparation, in which we review our own games and those of our opponents. The sweet spot is to expand and refine our list of possible moves, but not so much that we stop thinking for ourselves. I like the sandwich method. I start with my own brain (the bread), then I look at what the AI has to say about the situation (the tuna fish), and then I go back to thinking about the takeaways using my own brain.
LLMs cut both ways: they can make us sharper and smarter, or they can make us duller and more average, only able to think when we have a computer nearby. When playing chess against LLMs, we can see more clearly some of their strengths and limitations as coach or confidante, so we know when to say, āGoodnight Gemini.ā
Source: āAOL Entertainmentā