Kolmogorov complexity

Dark integers

(or, "Yet another way to learn more about the underlying structure of the universe.")

Let us assume.

I was reading my article on the distribution of computation space and going back over my results, but this time around, a new potential implication took root in my imagination.

Let's assume for the moment that Turing-completeness is necessary and sufficient for computation of any general sort, and furthermore, that there are no such mechanisms in existence that can compute on a fundamentally higher level. In other words, let us assume that every computing device or mechanism is isomorphic to any other such device or mechanism.

Let's also assume that my experimental analysis of computation space was more or less correct, at least in the broad strokes. If someone else conducted a similar experiment using a substantially different instruction set or methodology, some of the specifics might turn out differently, but I suspect the more fundamental properties I found would still show up.

These aren't trivial assumptions, but they're at the very least plausible. So we proceed.

Scarcity.

Near the end of my article, I speculated that primes would probably be under-represented in a list of numbers generated by random code and sorted by frequency. This was partly an observation, but mostly a logical argument; by their nature, primes eschew association with simple patterns, and simple patterns are the best you're going to get out of a random program generator. I would not expect primes to start showing up in bulk until we reached the level of complexity where a prime sieve might reasonably form, and that's way past the scope of what we were dealing with so far. (Side note: it would be interesting to figure out the shortest possible prime generating program using that sort of basic instruction set.)

What the article doesn't include is my follow-up analysis on the subject, which indeed showed that primes seem to be relatively less frequent than other integers in their neighborhood. Here's a plot relating the set of integers (absolute values were taken) to the number of times each one appeared in the cumulative output of a billion random micro-programs.

Integer Frequency through N=600

There is a lot in this plot that bears discussion, but let's stick to the highlights:

  • It is roughly logarithmic.
  • There are conspicuous spikes (and patterns of spikes), most notably around each %2^n%.
  • The red dots—you may have to click on it and zoom in to see them—are primes.
Conspicuous spikes.

If you haven't figured it out yourself, the spikes on the powers of two are a consequence of the binary assembly code in which these micro-programs are written. It's operationally trivial for computers to multiply or divide by two, so you'll see it a lot in a random distribution like this. And getting to %64% is not much more work than getting to %32% (after all, the infrastructure is already in place.) The same goes for %128%, %256%, and so on, which is why you see these spikes and why they're barely even tapering off.

Red dots.

Closer examination of where these dots fall show that they're almost always in a crevice, or at least on a slope, which is to say that the number in question has a notably smaller chance of being generated than one or both of its neighbors.

However, primes are way too slippery a customer to allow this to always happen. Take, for example, %n=31%. This is the first prime that's sitting up on a mesa. I argued before that primes are hard to generate, so what gives?

Well, for many purposes, the distribution of primes is effectively random. Because of this, some of them are doomed to land right next to a really popular neighbor, as with our poor %31%. As discussed, %32% is especially easy to generate, and as it happens, adding or subtracting %1% is also generally a piece of cake. It's just bad luck for %31% that his neighbor attracts all this traffic for him.

Degrees of introversion.

The example of %31% is particularly egregious, since those %2^n% spikes are the biggest bullies on the playground. But all primes are affected by this phenomenon to some extent. Since a prime by its nature can't be directly conjured via a loop (read: multiplication, exponentiation, hyper-operations), the only way to get to these numbers (short of e.g. prime sieves) is to land on some composite number close to it, and then happen to add or subtract the right constant. The frequency of any prime is ultimately a reflection of its neighborhood.

There are lots of consequences to this arrangement. Turning it around, one good example is the case of twin primes, which are primes that differ only by %2%. You only get twin primes when the number between them is divisible by %6%. Any multiple of %6%, being divisible by %2%, %3%, and %6%, is right up there as an easy target for random generation. The upshot here is that when you have a twin prime pair, you will almost always see a single-value spike between them, since they're surrounding such a juicy target.

But enough fun with plots.

Dark integers.

This is a term I pulled out of my ass which I like so much that I refuse to Google it and find out it's already taken.

We've shown how certain integers are "popular" in terms of computational space frequency, like the powers of two; we have also shown that some tend in the other direction, such as the primes. A dark integer, according to me, is loosely defined as an integer that's especially hard to generate through an algorithm of any kind. I am hampered by the fact that I don't have a working theory as to how to strictly identify these numbers other than by brute force, but I have some idea of their properties.

They are not necessarily prime—in fact, they seemingly prefer to be complicated composites with many disparate factors, often including one or more large primes as a factor. It does make some sense that this would be the ideal setup for a dark integer. The erratic and informationally-dense set of factors it comprises makes it an unpopular number already, and not being prime allows it to be bordered by or nearby to one or more primes, which will not attract any spillover traffic.

'Darkness' in this sense is a relative term, at least for the moment. Perhaps it will make sense to define a dark number as any integer %N% that has a strictly larger Kolmogorov complexity than any %n < N%, although that's still difficult to prove. At any rate, some numbers are darker than others, and while it should roughly correspond to which numbers are lowest on a plot such as the one above, we have to remember that this is an experimentally-derived data set and prone to noise, especially as frequency decreases.

That said, I would like to tip my hat to %3465%, which was the lowest number that did not appear a single time in my billion programs. Wolfram Alpha has this to say about %3465%:

  • %3465 = 3^2 \times 5 \times 7 \times 11%.
  • %3465% has the representation %3465=2^7 3^3+9%.
  • %3465% divides %34^6-1%.

Whether any of that is significant is too soon to say.

Time to get crazy.

So, do you remember what we're assuming about computers? Basically, that a computer is a computer by any other name? It comes into play now.

  • Assumption: The universe is not magic. It plays by the same Turing-equivalent rules as everything in it. This means physics is ultimately deterministic, and if we had infinite time and infinite memory, we could simulate our universe on a Turing machine.
  • Assumption: The Big Bang happened. The initial configuration of energy at its inception was effectively random.

The universe seemingly operates by following physical laws, with all matter and energy having been kicked off by that hot mess of a Big Bang. This strikes me as loosely analogous to my Skynet micro-program generator, what with random initial conditions carrying out clearly defined step by step processes. More to the point, our earlier assumption posits that they are mathematically interchangeable for our purposes.

Hypothesis: For any given countable set of anything in the universe, the quantity of that thing is relatively unlikely to be a dark integer, all things being equal.

To test the hypothesis:

  1. Identify dark integers. The more, the larger, the merrier.
  2. Identify anything that can be quantified.
  3. Quantify the anything.
  4. Repeat to within error bounds for confirmation or disproof.

There may, unfortunately, be some bounds on what the anything could actually be. Ideally, it would be as simple as catching a bunch of atoms of something over and over and tracking the result, but more likely it would have to be something more directly related to the Big Bang, such as star counts in galaxies.

If all of this were sound, conclusions could be drawn.

If there are fewer dark numbers after all, it may be further evidence of the truly random and arbitrary nature of creation, and more importantly, it may be strong evidence of a deterministic universe and all that that implies.

If there is no discrepancy when it comes to dark numbers, it may imply that there is a true stochastic or otherwise bizarre layer to physics, or it may imply that the Big Bang was ordered more deliberately than one would think, or that there is some agency at work that is disrupting the "natural" order of things in one way or another.

Either way, we learn.


Addendum

The dark integers on the graph above are specific to the instruction set used to generate it. While there is reason to expect that most sets of operations will be similar inasmuch as a binary system seems to be the simplest approach, it is not a guarantee. One can imagine pathological instruction sets that will yield an entirely different contour, such as one that uses the primes as its base units.

Long story short, if the binary analysis doesn't turn up results, it might be worth investigating the possibility of reverse engineering an instruction set by identifying its specific dark integers and trying to tease out what sort of atomic operations would give rise to those holes.

While applying any of this to cosmology is wildly speculative, biology and the mechanism of evolution seem like a perfect candidate for testing the viability of dark integers in general. Evolution is undeniably a finite computational force that lends itself well to the whole idea. Multi-cellular organisms are literally built from code, and it stands to reason that on balance, evolutionary pressures will tend to restrict most quantifiable biological features to easier-to-compute amounts. More on this as I investigate.

A thought of little bit

So here's how I went down the information theory rabbit hole this morning.

How much information is in a bit?

You might say '%1% bit.' I am inclined to agree.

How much information is in two bits?

My answer: it depends.

Disregarding any other considerations, and on average, I would agree that two bits stores two bits of information. However, let's say that the first bit you receive from some black box is a %0%. (Or it could be a %1%, and after inverting the rest of the following argument, it would yield identical results, which perversely almost makes it seems as though that first bit carries no information after all.)

But I digress. So, you have your %0%. Now, out comes another %0% as the second bit.


A word on this black box. You don't know what's inside it. It could be a friend connected through the internet, it could be a random number generator, it could be sequential bits from a vacation picture on your hard drive, it could be aliens trying to make contact, you name it.

Now, the specific details of the type of distribution and manner of bit-generating agents that might be in there is actually an absolutely critical point which is not to be glossed over, and so I will cheerfully completely disregard it, except to say that for our purposes, we will allow in a hand-wavy way that all the possibilities listed each represent valid non-negligible possibilities, as that is the spirit of this thought experiment.

Moving on. A priori, the second bit (%b_2%) could have been a %0% or a %1%. To be able to estimate which was more likely, without knowing the details about what the probability distribution of possible bit-generating entities actually is, is arguably impossible, but with the barest of assumptions, this is no longer the case. I maintain that a second %0% was indeed the more likely bit to arrive, and by a significant margin, too.


Consider it through the lens of entropy. Whatever is in the box, it will be emitting bits following some kind of overarching guidelines, which you can consider ranging from "perfectly ordered" to "perfectly entropic", the latter being more or less synonymous with "completely random." If it is a true RNG in there, then your odds are 50/50 for each bit that comes out, including %b_2%.

However, if it is something highly ordered, say a simple computer program, it may well be printing an endless stream of zeros. As was strongly indicated by my previous excursion into the cartography of computation space, repeated streams of bits are the single easiest thing to generate with any kind of reasonable computational framework; note that computational framework here can refer to anything from a programming language to the behavior of an ant colony on down to physics itself. It simply takes less work to generate some numbers than others. So, all things being roughly equal, I submit that you'll find a second 0 more often than not. This represents the more ordered possibility, while %b_2=1% would be the more entropic possibility.

Because of these probabilities, the underlying information content of that second bit is changed. After all, if something very likely happens, you've gained very little information; if the sun rises tomorrow, you can't conclude much from it. If the sun were not to rise, well, that contains a LOT of information, information you'd get to enjoy for maybe a month or two before the atmosphere freezes over.


Let's try a more mathematical but equally stark demonstration of the same essential argument. Consider: you watch the box for days, and all it does it spit out %0%s, one after another. Millions of them. Each new zero says even less than the one before it (or, arguably, devalues all bits received retroactively—probably moot.)

And then, one day, you wake up and see a single %1% came through. Being a man or woman of science, you eventually think to count exactly how many zeros came out before the %1%, and to your amazement, there are %3,141,592,654% zeros, which you naturally immediately recognize as %\pi \times 10^9%. Even setting aside any deeper implications about what that might portend, it is now obvious that the odds are extremely good that there is an agent or process which is highly structured at work in the box.

All that from a single bit. Clearly that %3,141,592,655^(th)% bit contained more information than the millions before it. You might say it shares its information content among all the bits before it, but it has still, by itself, made an enormous net change to the Shannon entropy and Kolmogorov complexity of the data. As a lower bound, you might say that in such a pattern, where you have %n% %0%s followed by a single %1%, that %1% contains at least a ballpark figure of %log_2 n% bits of actual information, that being the amount of information inherent in the number generated. E.g., %0000 0000 1% gives you %8% zeros, and %log_2 8 = 3%, which is the number of bits you'd require to generate a number between %1% and %8% inclusive.

So what?

I don't know. Stuff is neat.

(Maybe I'll show how primes fit into all this tomorrow!)