home

Machine Interoperability

2016-07-15

I've become persuaded that typical modern approaches to automated interop are deeply flawed.

Generally the current thing right now is to (1) call a subroutine in the same memory space (2) call a program with a different memory space, (3) do RPCs via web APIs (REST is a restricted RPC), or (4) to to exchange data via some enterprise bus. We can also do some variants of self-description using ASN.1, WS-API standards, etc. If we're careful here, we can do a very good job at this.

Essentially what the current world does when robust is to presume that the system is open, i.e., data exists outside of the ken of the system; the data is occasionally in error, and that the code is stable. Internally the program requires its internals to be correct for a given execution trace.

The effective situation here is that all programs have to be correct; and all programers have to program them; and that all programmers for a system have to enumerate the code involved in the system.

The Haskell/ML effort is a programme aimed at achieving provably & trusted correct programs. This has had significant difficulty being deployed and coping with the grunginess of industry data. Too many NULLs are out there; too much lousy code. We can Rewrite All The Things, but that's not feasible at scale. The basic question today is.... How do you write software that does the right job, even when the code itself does the wrong thing.

I have been inspired by Jaron Lanier's phenotropic programming idea[1]; what if we have a program that can achieve correctness with unreliable components? I don't mean to say we should be programming neural networks - that implicitly requires a training set and the idea that we're training against primarily numerically-encodable data. In my career, I have been working on things such as build systems; identity systems; CRUD systems for artifact data. None of these are achievable by today's neural network software. It's an impedance mismatch: I need specific code doing specific things recording specific other things. Neural networks recognize patterns and calculate numerical outputs.

There's an effort I've seen to translate existing functions into probabilistic functions. That doesn't solve it either. It doesn't relate to the work I do. Again: not crunching numbers here folks.

Another effort is "spec the output, run a genetic program to generate it". As professional software engineers know, a fully specced program implies the program is the spec. Again, not really ideal.

There's an effort now to partially generate programs based upon unit tests. Here, we are getting somewhere, but I havn't dug into it. It's similar to the previous one, but more feasible in theory (but what happens if you need to live in the IO monad? Bad day).

Not precisely sure where to go with this problem. My initial intuition is that some kind of goal-finding system has to be built; this goal system has to have access to a large stock of templates to yield the program sought. This is often called a compiler, however, which would imply this initial intution is a "5GL" in the '80s categorization of programming language systems. The goal finding system would grasp incorrect code and discard it, using only correct codes. Hmmm.

Something I am chewing on is where self-modifying code might be able to take us in the quest here. An aspect here is that all programs will need to have an identical interface for self-description. "What do you do?", should be a machine-answerable question.

[1] https://www.edge.org/conversation/why-gordian-software-has-convinced-me-to-believe-in-the-reality-of-cats-and-apples