Tuesday, September 14, 2010

The "von Neumann" machine & parallel processing

It's been a while since I last posted a message. I've been thinking a lot about machine architectures and the most prevalent one, the "von Neumann" one (see right). This architecture basically serves the Turing machine very well. The Turing machine is a bit more of a philosophical model, whereas the von Neumann architecture is a specification for the implementation of one. The CPU here is a serial processor unit which, in any way you put it, never executes more than one instruction at any given time, or per clock cycle. The CPU then has a couple of pins available that are connected to outside devices or buses. Sometimes the buses don't really contain anything, but you could put some PCI, PCIe, ISA or other kinds of electronics in them, or you could attach some device to a serial, parallel, USB or FireWire port and then the more advanced Operating Systems would allow the automatic registration of these devices and subsequent allocations of address spaces and such (through a driver and hardware-specific communication protocol like USB has).

The architecture above is very well suited for mathematical calculations, because you could write programs that are expected to execute code in a serial way and this architecture would per definition execute those commands one after the other, store the intermediate results in memory, pick them up again at a later stage, post-process them with other results, produce output to screen, write it to disk, send it over the internet and whatever other "logic" you dare to endeavour in.

The above describes a regular PC and everything is deterministically determined. Its very architecture makes it very hard, if not impossible, to write programs on it that behave in ways that you would not be able to explain, assuming you have the tools available to verify the signals and so on (so it's a theoretical statement). This deterministic processing is what makes the design viable. It is by its nature serial, because the CPU must retrieve one instruction to execute, fetch something from memory, store it into memory and do this in the correct order for the program to function correctly.

One architecture that is too complex for us to design is one where the CPU is no longer processing one instruction per clock cycle, one after the other in serial fashion, but a CPU where you can have a very high number of processing units and memory is not in one place, but distributed around this network as well.

Most applications that we write nowadays are data-driven. Now, the interesting thing here is that I think more and more applications are written data-driven style instead of purely as code. Anywhere else but the enterprise world that is :). Data-driven programming means that you move logic, output or processing from code (long switch statements or branches) into processing tables, datafiles or whatever. Almost, if not all, machine learning programs are always by definition data-driven programs. The algorithm itself is usually quite compact as a program, but frequently needs mountains of data to function correctly.

The main complication in the entire thing is that purely serially written code avoids the complexity of having to synchronize results later on, most importantly at the right time. This latter thing is also what plagues certain parallel architectures and programs that already do exist and also complicates designs for GPU's, FPGA's and so on. You could also call it the binding problem in psychology. It's easy to think of channels of processing where a little bit of the entire thing happens in that single stage, but connecting the dots later to make it a larger whole that means more than each channel individually is a very large challenge. Maybe not if you have the time to put it together thoroughly at leisure, but certainly at realtime.