Aleksey Bilogur

Kaggle kernels are Turing complete

02/25/2018

One joke that's in fashion amongst computer programming circles is proving that systems that aren't meant to be a certain thing called "Turing complete" actually are.

An instruction set, programming language, or other set of data-manipulation rules is considered to be Turing complete (named for Alan Turing, a brilliant World War II era computer scientist) if it provides everything you need to implement a computer. Furthermore, he showed that any computer which is Turing complete may be simulated by any other computer which is Turing complete.

Some things are obviously Turing complete. The instruction set on the central processing unit of the computer you're reading this webpage on, for example, or any serious programming language you can think of. However, the trick is that the minimum requirements for Turing completeness are actually very weak. "Accidentally Turing Complete" is the best accounting I'm aware of of weird things that are, well, Turing complete. Highlights include the Magic: The Gathering trading card game, the web-page styling language CSS, the database query language SQL, the scenario editor in the video game Age of Empires II, and my personal favorite, Microsoft Powerpoint:

Tom Wildenhain's proof hinges on the (already proven) fact that only a Turing machine can implement the "palindrome recognizer" algorithm he demonstrates. Another, similar way of proving an instruction set is Turing complete is by using it to implement another instruction set already known to be Turing complete. The most convenient of these is the Wang B-machine. The B-machine imagines that you have an infinitely long tape, on which you may operate using just the following four instructions:

Move the tape right one.
Move the tape left one.
Print a * at the current tape position. Then move right one.
If the current tape position is marked with an *, skip to instruction n. Otherwise, continue to the next instruction.

Amazingly, these four instructions are all it takes to build a computer. That means that, theoretically, any software can be implemented in Microsoft PowerPoint. Even incredibly difficult programs, like the ones in self-driving cars and lunar landing modules, are possible using nothing but Microsoft PowerPoint. This surprising insight shows just how broad the universe of "computable things" is.

Kaggle is a data science and machine learning platform and community which features, amongst other fetes, Kaggle Kernels, a managed environment for running code on the cloud. A great blog post by Yufeng G, a developer advocate at Google, gets a bit more into it. If you're familiar with Jupyter notebooks, kernels are like that, but in the cloud.

Due to some interesting feature choices by Kaggle, kernels on the platform are, technically, Turing complete. In other words it's possible to build a Turing machine (in this case, a Wang B-Machine) using the kernels themselves (not the code running inside of them) as instructions.

A Kaggle kernel may be run infinitely many times, with each kernel run resulting in a new kernel version. In order to do anything interesting, kernels need to have data to operate on; that data is provided by datasets uploaded to the platform, or other kernels, or even the kernel you are currently working on! The original use case for this feature is model ensembling: combined machine learning models which average the results of several models together to produce a "best-of" overall result. For one particular problem, one user even implemented a kernel that created a model that produced a statistically better result every time it was run!

This "self-reading" kernel feature "opens the door" to Turing completeness. The "Kaggle Kernels are Turing Complete" kernel demonstrates. This kernel defines a tape, one of each of Wang's four fundamental methods, and a small sample program (in Python) using them. Each time the kernel is run, the program reads its local data storage to retrieve the current program state: the tape (and marks thereof), the tape position, and the program position. Then it performs the B-machine instruction at the given position in the program. Finally it saves the world state (the tape, the tape position, and the program position) back to disc before exiting. All of this is done inside of this little code snippet:

If you have a Kaggle account, you can try it yourself. Fork the kernel, then change the settings on the Data tab so that your kernel uses its own (forked) copy of the kernel as a data source. Edit the kernel, Commit & Run it, and watch as this beautifully terrible idea tears through another instruction in the included example program. Wow!

Was this useful? Well, maybe knowing that kernels are Turing complete doesn't make the world a better place. But it does allow me to use the following meme around the workplace un-ironically:

And that's a big enough win for me.