Button-pushing explorers: How to grasp that AI agents can do amazing things while knowing nothing
- Written by Ji Y. Son, Professor of Psychology, California State University, Los Angeles
The nonprofit ARC Prize Foundation[1] on May 1, 2026, released the results of a new benchmark[2]: a test of an AI system’s ability to solve a game. The results were striking – humans scored 100%, while the most advanced AI systems scored under 1%.
At first glance, this may be surprising to users of AI who are impressed by its polished essays, codebases and multistep projects generated in seconds. How can these brilliant AI systems struggle with these simple Tetris-shape puzzles?
That confusion points to a risk: AI is becoming integrated into everyday life faster than people can make sense of it.
We are cognitive psychologists[3] who study[4] how to teach difficult concepts. To recognize the limits and risks of today’s AI agent systems, it’s important for people to grasp that the systems can both accomplish superhuman feats and make mistakes few humans would. To that end, we propose a new way to think about AIs: as button-pushing explorers.
Mental models for AI
We teach college students, a group rapidly incorporating AI tools into their daily routines. That gives us regular opportunities to ask what they think is going on with AI. The answers vary widely. One student said that someone at OpenAI or Anthropic is reading and approving every response the system generates. Another, more succinctly, said, “It’s magic.”
These responses illustrate two tempting ways of making sense of AI. At one extreme, AI is treated as an inscrutable black box – a powerful but ultimately mysterious force. At another, people explain it using the same assumptions they use to understand other humans: that its outputs reflect reasoning or judgment.
The worry is that these misinterpretations don’t go away as users gain more experience interacting with AI, and they might get reinforced[5]. When AI performs well, its output can feel like evidence of understanding or confirmation that it really is something like magic. That apparent success makes it harder to question what the system is actually doing. Biases can seem logical or inevitable; harmful behavior can look like a deliberate choice or even fate, as if it could not have gone any other way.
Cognitive scientist Anil Seth explains why AIs don’t have – and won’t have – consciousness.Saying that AI models are shaped by patterns in data, training processes and system design is true, but that’s too abstract to tell people when to trust the systems’ outputs or when they might fail. To help people avoid misplaced trust in AI, AI literacy efforts will need to include some mechanistic understanding of what produces their behavior – explanations that are perhaps not perfectly accurate but useful. Statistician George Box[6] once wrote[7], “All models are wrong, but some are useful.”
Researchers have come up with several mental models for large language models. One is “stochastic parrot[8],” which shows that the models use statistical methods – stochastic refers to probabilities – to mimic responses with no understanding of meaning. Another is “bag of words[9],” which emphasizes that the models are collections of words – for example, all English words found on the internet – with a mechanism for giving you the best set of words based on your prompt.
These ways of thinking about large language models were never meant to be complete accounts of the systems. But the metaphors serve an important cognitive purpose: They push back against the idea that fluent language is necessarily caused by humanlike understanding.
But as the AI systems people use are increasingly powerful agents capable of stringing together actions on their own, it’s important for people to have a different kind of mental model: one that explains how they act. One place to find such a model is in earlier research on AI systems that learned to play Atari 2600 games. These systems didn’t understand the games the way humans do, but they still managed to rack up a lot of points.
The simple loop: Act, observe, adjust
Imagine a neural network, a relatively simple kind of AI model, placed into a video game it has never seen before. It does not “understand” the game like a human would. It has no idea whether it’s shooting space invaders or navigating an ancient pyramid. It doesn’t know the goals or rules.
Instead, it learns to play through a simple loop: Take an action – move left, jump, shoot – observe what changes, and then adjust. If an action leads to a good outcome, such as gaining points, it adjusts to become more likely to take similar actions in similar situations. If it leads to a bad outcome, such as losing a life, it adjusts in the opposite direction.
Even this simple mechanism can produce surprisingly capable behavior. Over time, by repeating this loop, the neural networks learned to play a wide range of Atari games – but not all games.
There is one game that famously stumped these early neural networks: Montezuma’s Revenge[10]. To make progress, a player must carry out a long sequence of actions – climbing ladders, avoiding obstacles, retrieving keys – before receiving any reward at all. Unlike simpler games, most actions offer very little immediate feedback. The game required something like goal-directed, long-term planning.
Early neural networks would try a few actions, receive no reward and fail to make further progress through Montezuma’s underground pyramid. From the system’s perspective, all actions looked equally useless. But researchers made a breakthrough by changing the feedback signal[11]. Instead of rewarding only success, they also rewarded the system for doing something new. The rewards were for visiting parts of the game it had not seen before or trying actions it had not previously taken. This tweak encouraged exploration.
In 2016, Google DeepMind rewarded its AI model for exploration – try something, see what happens, adjust – while playing the Atari 2600 game Montezuma’s Revenge, which dramatically improved the AI’s performance on the game that’s notoriously difficult for AIs.With that change, performance improved dramatically. The neural network began navigating obstacles, taking multiple steps toward goals and adapting when things went wrong. From the outside, this kind of behavior can look like planning or problem-solving. But what looks like planning was not caused by sophisticated planning abilities. The underlying mechanism is still the same simple loop: act, observe, adjust.
This kind of system isn’t a stochastic parrot or a bag of words. It’s closer to a button-pushing explorer: something that doesn’t understand the world in a human sense but moves forward by pushing buttons, seeing what happens and adjusting what it does next.
From video games to modern AI agents
Today’s AI systems can do far more than play games like Montezuma’s Revenge. They can coordinate tools, write and run code, and carry out multistep projects. The range of possible actions is much larger, and the environments in which they operate are increasingly complex.
But these agents are still fundamentally button-pushing explorers. The behavior can be sophisticated, but the process that produces it is not. Humans can often infer how a new environment works after just a few observations. Systems that rely on these feedback loops cannot. They need to try many actions and see what happens before they can make progress.
This helps explain both the strengths of these AI systems and some of their most concerning failures. What these agents learn depends on what is being rewarded. And in real-world systems, those reward signals are often imperfect.
AI systems that conduct negotiations aim to maximize their client’s interests[12], sometimes with deceptive tactics. Rental pricing software used by landlords ends up price fixing[13]. Marketing tools generate persuasive but misleading reviews[14].
These systems aren’t trying to be evil or greedy. They are adjusting to the signals they are given. From the button-pushing explorer perspective, these failures are downright predictable.
Effective AI literacy means holding two ideas at once: These systems can do surprisingly complex things, and they are not doing them the way humans do. If AI is seen as humanlike or magical, its outputs feel authoritative. But if it is understood, even imperfectly, as a button-pushing explorer shaped by feedback, people are likely to ask better questions: Why is it doing this? What shaped this behavior? What might it be missing?
That’s the difference between being impressed by AI and being able to reason about it.
References
- ^ ARC Prize Foundation (arcprize.org)
- ^ results of a new benchmark (arcprize.org)
- ^ cognitive psychologists (scholar.google.com)
- ^ who study (scholar.google.com)
- ^ might get reinforced (doi.org)
- ^ George Box (doi.org)
- ^ once wrote (books.google.com)
- ^ stochastic parrot (doi.org)
- ^ bag of words (www.experimental-history.com)
- ^ Montezuma’s Revenge (store.steampowered.com)
- ^ changing the feedback signal (doi.org)
- ^ maximize their client’s interests (doi.org)
- ^ price fixing (theconversation.com)
- ^ misleading reviews (www.ftc.gov)
Authors: Ji Y. Son, Professor of Psychology, California State University, Los Angeles




