So here's how I went down the information theory rabbit hole this morning.

How much information is in a bit?

You might say '%1% bit.' I am inclined to agree.

How much information is in two bits?

My answer: it depends.

Disregarding any other considerations, and on average, I would agree that two bits stores two bits of information. However, let's say that the first bit you receive from some black box is a %0%. (Or it could be a %1%, and after inverting the rest of the following argument, it would yield identical results, which perversely almost makes it seems as though that first bit carries no information after all.)

But I digress. So, you have your %0%. Now, out comes another %0% as the second bit.


A word on this black box. You don't know what's inside it. It could be a friend connected through the internet, it could be a random number generator, it could be sequential bits from a vacation picture on your hard drive, it could be aliens trying to make contact, you name it.

Now, the specific details of the type of distribution and manner of bit-generating agents that might be in there is actually an absolutely critical point which is not to be glossed over, and so I will cheerfully completely disregard it, except to say that for our purposes, we will allow in a hand-wavy way that all the possibilities listed each represent valid non-negligible possibilities, as that is the spirit of this thought experiment.

Moving on. A priori, the second bit (%b_2%) could have been a %0% or a %1%. To be able to estimate which was more likely, without knowing the details about what the probability distribution of possible bit-generating entities actually is, is arguably impossible, but with the barest of assumptions, this is no longer the case. I maintain that a second %0% was indeed the more likely bit to arrive, and by a significant margin, too.


Consider it through the lens of entropy. Whatever is in the box, it will be emitting bits following some kind of overarching guidelines, which you can consider ranging from "perfectly ordered" to "perfectly entropic", the latter being more or less synonymous with "completely random." If it is a true RNG in there, then your odds are 50/50 for each bit that comes out, including %b_2%.

However, if it is something highly ordered, say a simple computer program, it may well be printing an endless stream of zeros. As was strongly indicated by my previous excursion into the cartography of computation space, repeated streams of bits are the single easiest thing to generate with any kind of reasonable computational framework; note that computational framework here can refer to anything from a programming language to the behavior of an ant colony on down to physics itself. It simply takes less work to generate some numbers than others. So, all things being roughly equal, I submit that you'll find a second 0 more often than not. This represents the more ordered possibility, while %b_2=1% would be the more entropic possibility.

Because of these probabilities, the underlying information content of that second bit is changed. After all, if something very likely happens, you've gained very little information; if the sun rises tomorrow, you can't conclude much from it. If the sun were not to rise, well, that contains a LOT of information, information you'd get to enjoy for maybe a month or two before the atmosphere freezes over.


Let's try a more mathematical but equally stark demonstration of the same essential argument. Consider: you watch the box for days, and all it does it spit out %0%s, one after another. Millions of them. Each new zero says even less than the one before it (or, arguably, devalues all bits received retroactively—probably moot.)

And then, one day, you wake up and see a single %1% came through. Being a man or woman of science, you eventually think to count exactly how many zeros came out before the %1%, and to your amazement, there are %3,141,592,654% zeros, which you naturally immediately recognize as %\pi \times 10^9%. Even setting aside any deeper implications about what that might portend, it is now obvious that the odds are extremely good that there is an agent or process which is highly structured at work in the box.

All that from a single bit. Clearly that %3,141,592,655^(th)% bit contained more information than the millions before it. You might say it shares its information content among all the bits before it, but it has still, by itself, made an enormous net change to the Shannon entropy and Kolmogorov complexity of the data. As a lower bound, you might say that in such a pattern, where you have %n% %0%s followed by a single %1%, that %1% contains at least a ballpark figure of %log_2 n% bits of actual information, that being the amount of information inherent in the number generated. E.g., %0000 0000 1% gives you %8% zeros, and %log_2 8 = 3%, which is the number of bits you'd require to generate a number between %1% and %8% inclusive.

So what?

I don't know. Stuff is neat.

(Maybe I'll show how primes fit into all this tomorrow!)