The lacklustre reception of the recent GPT-5 release1 (Aug 2025) was a reminder of the slowing progress we can expect from future LLMs. The current LLM design has fundamental limits when it comes to achieving artificial general intelligence (AGI).
Besides not being able to learn easily after their very long and expensive training runs have completed, current LLMs seem unable to count2 or do even simple arithmetic3. They fail at algorithmic reasoning tasks4 and give wrong responses with confidence (i.e. hallucinate). And they just can’t seem to follow the rules of chess5!
Therefore, most chatbot providers have now pivoted to agent-like systems that enrich the LLM’s context with past conversations, chain-of-thought, and tools. These tools are specifically designed to patch weaknesses of LLMs, such as math and code execution.
This sounds an awful lot like what I call the “Frankenstein stage of technology development”. There is little left to squeeze out of the core idea and the only progress consists of bolting on more stuff. Kind of like when mobile phones just started to get more cameras after the initial rapid improvements in terms of screen size, processing power and miniaturization had reached their limits6.
Agents. A Blast from the Past.
The idea of an “agent” was developed over 50 years ago to describe learning behaviour that involved repeated interaction with the environment and feedback or rewards. Both, psychology and early AI (or cybernetics) research adopted the concept and refined it7.
Richard S. Sutton formalized the concept of an autonomous agent in his 1983 PhD dissertation8: An agent interacts with an environment in discrete time steps $t$. The agent’s entire knowledge and perception of the environment (at time $t$) is captured in state $S_t$. Based on this state, the agent selects an action $a_t$. The agent then receives a state update $S_{t+1}$ and a reward $r_{t+1}$, which is key for reinforcement learning (RL).
The basic agent-environment interaction loop.
Modern LLM-based chatbots (or so called “agentic” AI) draw on some but not all of the characteristics of this classic RL agent. They interact with the environment (i.e. the user) in discrete steps and their actions consist of generating tokens. Chatbots also update their state, e.g. by saving the user’s response to their conversational history. Additionally, they “act” by using tools and their “environment” can be their own output during chain-of-thought.
However, chatbots deviate from Sutton’s classic RL agent definition due to the lack of continuous online learning. Chatbots are mostly “fixed” at inference time. Reinforcement learning from human feedback (RLHF) is used during LLM fine-tuning, but not during the chat session.
The thumbs up/down that users can provide and other session metrics are probably anonymized and aggregated before being used for offline model fine-tuning. The closest chatbots come to learning within session is the way they change their behaviour based on the chat history and some user preferences (e.g. system prompt). But anyone who has tried to correct a chatbot knows the limits of the teaching-via-context approach. Without weight adjustments chatbots tend to eventually revert back to their original behaviour.
Conclusion
LLM progress has slowed even while the effort to improve the paradigm has increased tremendously. Currently billions of dollars are being invested in squeezing out marginal gains that don’t fundamentally make LLMs more intelligent9. This fact alone should give anyone pause who was hoping that LLMs could become AGI.
Of course, it’s possible that there will be another game-changer like the transformer architecture10, but this is increasingly unlikely given that thousands of publications each year haven’t delivered any major breakthrough.
The autoregressive transformer architecture11 does not seem to be suitable for true intelligence, which is the ability to “achieve goals in a wide range of (novel) environments”12. For now, LLMs mostly retrieve and interpolate between the language examples they were trained on. So by definition, they cannot generalize and give responses that are outside or in conflict with the training data.
While the LLM-chatbot loop sounds similar to the classic RL agent–environment interaction paradigm, it still differs fundamentally. Chatbots don’t learn online and even if they did it’s not clear if the classic RL agent approach would be feasible given the large number of states such a system could be in. I’m skeptical that LLMs themselves will be the key to AGI, despite the advances of LLM-based agents over “naked” LLMs.
-
ChatGPT users hate GPT-5’s “overworked secretary” energy, miss their GPT-4o buddy ↩︎
-
“Strawberry” ↩︎
-
Limitations of Language Models in Arithmetic and Symbolic Induction ↩︎
-
GPT-5 and GPT-5 Thinking as Other LLMs in Chess: Illegal Move After 4th Turn ↩︎
-
Another example of the Frankenstein stage of development was the auto-ML craze, when the data science industry became fixated on training every available model under the sun with every possible feature combination (2016). This brute-force approach might have worked well to win Kaggle competitions, but it also signalled that traditional machine-learning techniques had run out of fresh ideas. Lacking innovation, the industry leaned on doing more rather than doing better. Ironically, the companies building these auto-ML systems still profited, since the trend drove demand for more compute, storage, and infrastructure. (🤔 Hmmm. These are some interesting parallels to the current chatbot industry.) ↩︎
-
The conceptual foundations of the agent-environment model were emerging well before modern reinforcement learning. In cybernetics (Norbert Wiener, 1948) and optimal control theory (Bellman, Kalman, Pontryagin), researchers explored how systems could adapt their behaviour to achieve goals in dynamic environments through feedback loops. In animal learning theory, psychologists and neuropsychologists framed behaviour as a process of stimulus-response interactions shaped by reinforcement (Thorndike, Skinner, Hebb). They theorized that intelligent behaviour arises from continuous interaction between an agent and its environment, where actions influence future observations and rewards. ↩︎
-
Sutton, R. S. (1983). Temporal credit assignment in reinforcement learning. Doctoral dissertation, University of Massachusetts Amherst. ↩︎
-
For example: Quantization, KV-cache optimization, Mixture-of-Experts, speculative decoding, distillation. ↩︎
-
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention is all you need. arXiv. ↩︎
-
To be more precise: autoregressive deep learning multi-head self-attention transformer architecture with tokenized input/output. ↩︎
-
François Chollet’s “On the Measure of Intelligence” (2019): “Intelligence measures an agent’s ability to achieve goals in a wide range of environments — including ones it has never encountered before.” ↩︎