The Halting Problem in AI Coding

Last Updated |

March 10, 2026

Introduction to the Halting Problem in Mathematics

There is a short list of ideas from early computer science that are (a) worth knowing and (b) just plain fun to think about. One of these is the Halting Problem. This is a landmark undecidability problem from computability theory - the field that describes what can and cannot be computed.

A physical implementation of a Turing machine. Will it ever stop?

The problem centres around the question of whether or not any given program will ever stop running, or just get stuck in an infinite loop (assuming it is allowed to run forever by the hardware). The existence of an algorithm that could decide this for all programs is a proven impossibility. Whether a program will stop or not is mathematically undecidable. This is the most well known and only undecidability problem in computing. It belongs to a branch of mathematical results developed by luminaries of early 20th Century mathematics and computing including Hilbert, Gödel, Church and Turing, and culminated in Rice's theorem, proved in 1950. Rice's theorem generalises the halting problem statement to say that all non-trivial semantic properties of programs are undecidable. Here, a non-trivial semantic property is a behaviour that some, but not all programs can have. For example, it is not possible to determine in general whether a program will produce a specific output for a given input or whether a program computes a particular function.

Why This Matters for AI-Assisted Software Development

This has some important consequences for software development and testing in the age of generative AI and coding agents (AI-DLC). Static analysis of code can already be done with linters, type checkers, model checkers and formal verification (like TLA+), and these are very powerful tools. We would obviously want to enhance these methods by adopting new generative AI technologies. However, Rice's theorem means that, impressive as they are, static analysis of code by LLMs, as with any other method, will always be unable to guarantee completeness of testing. Even if we reach the mythical AGI, no LLM can predict the behaviour of all or any given program just from reading their code, and "any given" may include yours.

Take halting, for example. Halting, within a given amount of time, is an important behaviour of code, especially in a world where computation is a physical process that has to be bought and paid for, and users usually want outputs. To determine this, eventually the agent is going to just have to execute the code (note that I say "given amount of time" - as this is all that the simulation can tell us too).

This is a fundamental limit on AI testing even before we introduce anything complicated like distributed systems into the mix, which can add more undecidability problems.

The Case for Execution-Based Testing

One way to overcome this is to build an environment where the code can be executed, and then use an agent to define the input and expected output. This is already done using hand-written CI/CD processes and integration tests, but these are very time intensive to write and maintain, especially when a codebase is changing quickly. As we adopt agents to write more code, they will change ever more quickly. AI agents have developed to the point where they can autonomously write the correct configs to start a virtual machine, and repeatedly run code on the virtual machine. Where we have distributed systems, this can be done for each machine in the network, and we can run the whole system in order to probe behaviour, such as race conditions or cascading failures (though non-deterministic behaviours may require repeated runs or chaos-engineering techniques to find). Likewise for systems made up of multiple applications.

The Challenge of Distributed Systems

Monitoring and analysing the behaviour of such a distributed system is inherently difficult, hence the existence of the multibillion dollar observability market. Anticipating what effect any one change in the code will have across the system is difficult for engineers. Leslie Lamport once defined a distributed system as "a system in which your computer can be taken down by a computer that you did not know even existed". This problem is no less difficult for even advanced AI agents. All models have a limited context window that narrows their view of a system. This limitation precludes them from exploring and anticipating very many possible behaviours of a complex software system, and the undecidability problems discussed above reinforce this. Further, when working with agents to write code, and to do so with increasing volume and speed, that option of the engineer understanding the code to a level at which they can anticipate how any one change will affect the whole system is becoming increasingly untenable.

How Kerno Solves This

At Kerno, we believe that anyone writing production software with AI assistance, especially in distributed systems, will need to do execution-based testing. At Kerno, we are building a great way to do it. Kerno can take a repository, index the code, and then write integration tests for any endpoint in the system. It can then orchestrate and run any of the services needed to perform these tests by executing the code. It can do all of this seamlessly with agents. Further, it can monitor any changes made to the codebase, adapt the tests to the code, and give the user a clear measure of how a change to the code will change the behaviour of the system. Execution-based testing is not a panacea for software development. It cannot give the user certainty, but it can provide empirical evidence for behaviours, and time bounds for execution of any steps. Kerno can make this process markedly easier.

Subscribe to our blog

Get the latest technical guides and product updates delivered to your inbox.

Subscribe to the AI Builder Series

Get a weekly roundup of practical guides, tools, and insights for building AI-native products.

You're in!

Oops! Something went wrong while submitting the form.