information theory

Empirical evidence for the Axiom of Choice

Empirical evidence for the Axiom of Choice

(Credit to xkcd for the comic, of course.)

Near as I can figure, the go-to framework for mathematics these days is Zermelo-Fraenkel set theory, with or without the Axiom of Choice (denoted as ZF or ZFC, respectively). The Axiom of Choice (AC) was contentious in its day, but the majority of mathematicians now accept it as a valid tool. It is independent of ZF proper, and many useful results require that you either adopt or reject it in your proof, but you're allowed to do either.

The simplified version of AC is that it allows you to take an element from any number of non-empty sets, thereby forming a new set. Where it diverges from ZF is that it allows for rigorous handling of infinite sets, something that is impossible within ZF. Imagine you have an infinite number of buckets, each one containing an infinite number of identical marbles; AC finds it acceptable for you to posit, in the course of a proof, that you somehow select one or more marbles from each of those buckets into a new infinite pile.

Unsurprisingly, this means that AC is equivalent to a number of other famous results, notably the well-ordering principle, which counter-intuitively states that there is a way to totally order any set given the right binary operator, even e.g. %\RR%. Worse yet, it lets you (via the Banach-Tarski paradox) disassemble a tennis ball into five or six pieces and reassemble them into something the size of the sun. In other words, AC is meant to be an abstract mathematical tool rather than a reflection of anything in reality.

Now, I could spend forever and a day discussing this sort of stuff, but let's move along to my thought: what it has to do with the Big Bang.

As you all know, the Big Bang ostensibly kicked off our universe some %13.8% billion years ago. However, some of the details are still kind of hazy, especially the further back you look. Past the chronological blinder of the Planck Epoch—the first %~10^-43% seconds of our universe—there is essentially a giant question mark. All of our physics models and equations break down past that point. Theory says that the big bang, at its moment of inception, was an infinitely dense zero-dimensional point, the mother of all singularities. One moment it's chillin', the next it explodes, and over time becomes our universe.

I don't love this theory, but it's the best fit we've got so far for what we've observed, particularly with red-shifting galaxies and the cosmic microwave background radiation, so we'll run with it for the moment. So where did all of these planets and stars actually come from, originally? There is a long and detailed chronology getting from there to here, but let's concern ourselves with looking at it from an information-theoretic point of view, or rather, 'How did all of this order and structure come out of an infinitesimal point?'

It's unclear, but the leading (though disputed) explanation is inflation, referring to an extremely rapid and sizable burst of expansion in the universe's first moments. There is apparently observational data corroborating this phenomenon, but a lot of the explanation sounds hand-wavy to me, and as though it were made to fit the evidence. The actual large structure of the universe is supposed to have arisen out of scale-invariant quantum fluctuations during this inflation phase, which is a cute notion.

Note, by the way, that entropy was also rapidly increasing during this step. In fact, my gut feeling on the matter is that since entropy is expected to strictly increase until the end of time (maximum entropy), it makes plenty of sense that the Big Bang kernel would have had zero entropy—hell, it's already breaking all the rules. While thermodynamic and information-theoretic entropy are ostensibly two different properties, they're one and the same principle underneath. Unless I'm gravely mistaken, no entropy would translate to complete order, or put another way, absolute uniformity.

If that was indeed the case, its information content may have been nothing more than one infinitely-charged bit (or bits, if you like); and if that were the case, there must have been something between that first nascent moment and the subsequent arrival of complex structure that tore that perfect node of null information asunder. Whether it was indeed quantum fluctuations or some other phenomenon, this is an important point which we will shortly circle back to.

It's far from settled, but a lot of folks in the know believe the universe to be spatially infinite. Our observable universe is currently limited to something just shy of %100% billion light years across; however, necessary but not sufficient for the Big Bang theory is the cosmological principle, which states that we should expect the universe to be overwhelmingly isotropic and homogeneous, particularly on a large scale (%300%+ mil. light years). This is apparently the case, with statistical variance shrinking down to a couple percent or much less, depending on what you're examining.

That last bit is icing on the cake for us. The real victory took place during whatever that moment was where a uniform singularity became perturbed. (Worth noting that if the uniformity thing is true, that mandates that it really was an outside agency that affected it in order to spawn today's superstructure, which of course makes about as little sense as anything else, what with there presumably being no 'outside' from which to act.)

So here's the punchline. If you assume an infinite universe, that means that the energy at one time trapped featureless in that dimensionless point has since been split apart into an infinitude of pieces. "But it's still one universe!" you say. Yes, but I think it's equally valid to recognize that our observable universe is finite, and as such, could be represented by %NN% (if discrete) numbers, or %RR% (if not), or %\aleph_n% if things are crazier than we know. Regardless, it could be described mathematically, as could any other of the infinitely many light cones which probably exist, cozy in their own observable (but creepily similar) universe.

Likewise, we could view each observable universe as a subset of the original Big Bang kernel, since that is from whence everything came. It must be representable as a set of equal or larger cardinality to the power set of all observable universe pockets, and therefore the act of splitting it into these subsets was a physical demonstration of the Axiom of Choice in reality!

I'm not sure what exactly that implies, but I find it awfully spooky. I feel like it either means that some things thought impossible are possible, or that this violation happened when the Big Bang kernel was still in its pre-Planck state; but if that's the case, not only do our physical models break down, our fundamental math would be shown not to apply in that realm either, which means it could have been an anti-logical ineffable maelstrom of quasi-reality which we have no hope of ever reasoning about in a meaningful way.

Ideal combinatoric strings

This is something I worked on a year ago, so I'll keep it (relatively) brief.

Inspiration

There's a keypad on my apartment building which accepts 5-digit codes. One day on the way in, I started thinking about how long it would take to guess a working code by brute force. The most obvious answer is that with five digits, you have %10^5 = 100,000% possible codes; since each one of these is 5 digits long, you're looking at %500,000% button pushes to guarantee finding an arbitrary working code.

Realization

But then I realized you could be a little smarter about it. If you assume the keypad only keeps track of the most recent five digits, you can do a rolling approach that cuts down your workload. For example, say I hit 12345 67890. If the keypad works as described, you're not just checking two codes there, you're checking 6: 12345, 23456, 34567, 45678, 56789, 67890.

Foundation

The next natural question was how many button-pushes this could actually save. After doing some work, I satisfied myself that you could cut it down by a factor of %~D%, where %D% is the number of digits in a code. So if you typed the right digits in the right order, instead of the %500,000% before, you're only looking at %100,004% (you need an extra 4 to wrap the last few codes up).

Experimentation

The next natural question was: how do you actually come up with that string of digits? It has to be perfect, in that it contains every possible 5-digit combination without repeating any of them. As with almost any exploratory problem, the best approach is to simplify as much as possible. For instance, consider a binary code three digits long, which only has %2^3 = 8% different codes: %{000, 001, 010, 011, 100, 101, 110, 111}%.

My formula suggested you should be able to hit all eight of those in a %2^3 + 3 - 1 = 10% digit string, and it's easy enough to put one together by a little trial and error: 0 0 0 1 1 1 0 1. I found it was easiest to treat these strings as cyclical, so the 0 1 at the end wrap around to give you 0 1 0 and 1 0 0. As a bonus, any rotation of this string will work just as well.

Perspiration

As I scaled the problem up, however, more and more things became clear. First, above a certain point, you start getting multiple viable optimal strings that are not simple transformations of one another. Second, finding an elegant way to generate these strings was not turning out to be easy.

I found one mechanical way of generating a valid string that worked, but I didn't love it. If you list all the combinations you have to cover, and then slot each combination into a buffer greedily, meaning the earliest spot where it can fit (potentially with overlap), it works out.

E.g.:
mechanical approach

Beautification

At some point I realized the generative process could be viewed as a directed graph, the nodes representing an N-length code, its successors delineating alternatives for continuing the string. After a few attempts, I got a pretty clear-looking one down (the node labels 0-7 are standing in for their binary counterparts):

directed graph

As it turns out, you can start on any node and if you trace a Hamiltonian cycle—touching each vertex only once and ending back at the start—the numbers you hit along the way form a valid optimal string. This approach also scales with the parameters of the problem, but requires a messier or multi-dimensional graph.

Confirmation

Whenever I stumble into open-ended problems like this, I avoid Googling for the answer at first, because what's the fun in that? I was pretty sure this would be a solved problem, though, and after spending a while working this through myself, I looked for and found Wikipedia's page on De Bruijn sequences. As usual, I was beaten to the punch by somebody a hundred years earlier. Hilariously, however, in this case the results matched up better than expected. Check out the Wiki page, notably a) the graph on the right, and b) the first line of the section "Uses".

Notation

If you want to see the wild scratch pad which brought me from point A to point B, by all means, enjoy.

Lower bound of big-O for sort

This is a quick story about today's thing that I discovered that is already in Wiki. I must be up in the triple digits at this point. That said, this is one of the more obvious ones.

I was reading a thread about which sorting algorithm could be considered "best," and someone mentioned a couple of algorithms which allegedly run in %O(n log log n)% time. This came as a shock, since I'd thought %n log n% was the brick wall for an honest sort, and I wondered if these other sorts were for real, whether that would imply the existence of a linear time sort.

I tried to imagine what the lower bound would be, and figured there must be a minimum number of comparisons required for some number of elements. Didn't take long to get from there to realizing that %n% integers can be arranged in %n!% different permutations, which I reasoned meant that one must gather enough information (read: take comparisons) from the data to uniquely identify which of %n!% different arrangements the sort is in.

That, in turn, screams out for a log, specifically %log_2(n!)%. If the sort permutation is truly random, then on average, we should expect to be able to identify it from %log_2(n!)% bits (read: comparisons, again.) To be a little more precise, I guess it'd be more like \[ \frac{\lceil{n \log_2 (n!)}\rceil}{n} . \]

I cheated a little here and plugged %lim_{n->\infty} [log_2(n!)]% into Wolfram Alpha, and it was clear the dominating factor is, surprise surprise, %n log n%. As for those mystery %n log log n% algorithms, they were tough to track down too, and there seemed to be a lack of final consensus on the issue. Due to the math described herein if nothing else, they must operate with some limitation or another on domain or input, assuming they do work.

Later, seeing the bottom of the Wikipedia page on sorting algorithms, I saw all of this done similarly, with the weird surprise that once you get to %n=12%, the maximum number of comparisons required suddenly goes up by one inexplicably, requiring %30% comparisons where we predict %29%. Sadly, the footnote links explaining those aberrations were in journals behind paywalls, but the gist seemed to be that it was a bit of an open (or at least highly non-trivial) question.

A thought of little bit

So here's how I went down the information theory rabbit hole this morning.

How much information is in a bit?

You might say '%1% bit.' I am inclined to agree.

How much information is in two bits?

My answer: it depends.

Disregarding any other considerations, and on average, I would agree that two bits stores two bits of information. However, let's say that the first bit you receive from some black box is a %0%. (Or it could be a %1%, and after inverting the rest of the following argument, it would yield identical results, which perversely almost makes it seems as though that first bit carries no information after all.)

But I digress. So, you have your %0%. Now, out comes another %0% as the second bit.


A word on this black box. You don't know what's inside it. It could be a friend connected through the internet, it could be a random number generator, it could be sequential bits from a vacation picture on your hard drive, it could be aliens trying to make contact, you name it.

Now, the specific details of the type of distribution and manner of bit-generating agents that might be in there is actually an absolutely critical point which is not to be glossed over, and so I will cheerfully completely disregard it, except to say that for our purposes, we will allow in a hand-wavy way that all the possibilities listed each represent valid non-negligible possibilities, as that is the spirit of this thought experiment.

Moving on. A priori, the second bit (%b_2%) could have been a %0% or a %1%. To be able to estimate which was more likely, without knowing the details about what the probability distribution of possible bit-generating entities actually is, is arguably impossible, but with the barest of assumptions, this is no longer the case. I maintain that a second %0% was indeed the more likely bit to arrive, and by a significant margin, too.


Consider it through the lens of entropy. Whatever is in the box, it will be emitting bits following some kind of overarching guidelines, which you can consider ranging from "perfectly ordered" to "perfectly entropic", the latter being more or less synonymous with "completely random." If it is a true RNG in there, then your odds are 50/50 for each bit that comes out, including %b_2%.

However, if it is something highly ordered, say a simple computer program, it may well be printing an endless stream of zeros. As was strongly indicated by my previous excursion into the cartography of computation space, repeated streams of bits are the single easiest thing to generate with any kind of reasonable computational framework; note that computational framework here can refer to anything from a programming language to the behavior of an ant colony on down to physics itself. It simply takes less work to generate some numbers than others. So, all things being roughly equal, I submit that you'll find a second 0 more often than not. This represents the more ordered possibility, while %b_2=1% would be the more entropic possibility.

Because of these probabilities, the underlying information content of that second bit is changed. After all, if something very likely happens, you've gained very little information; if the sun rises tomorrow, you can't conclude much from it. If the sun were not to rise, well, that contains a LOT of information, information you'd get to enjoy for maybe a month or two before the atmosphere freezes over.


Let's try a more mathematical but equally stark demonstration of the same essential argument. Consider: you watch the box for days, and all it does it spit out %0%s, one after another. Millions of them. Each new zero says even less than the one before it (or, arguably, devalues all bits received retroactively—probably moot.)

And then, one day, you wake up and see a single %1% came through. Being a man or woman of science, you eventually think to count exactly how many zeros came out before the %1%, and to your amazement, there are %3,141,592,654% zeros, which you naturally immediately recognize as %\pi \times 10^9%. Even setting aside any deeper implications about what that might portend, it is now obvious that the odds are extremely good that there is an agent or process which is highly structured at work in the box.

All that from a single bit. Clearly that %3,141,592,655^(th)% bit contained more information than the millions before it. You might say it shares its information content among all the bits before it, but it has still, by itself, made an enormous net change to the Shannon entropy and Kolmogorov complexity of the data. As a lower bound, you might say that in such a pattern, where you have %n% %0%s followed by a single %1%, that %1% contains at least a ballpark figure of %log_2 n% bits of actual information, that being the amount of information inherent in the number generated. E.g., %0000 0000 1% gives you %8% zeros, and %log_2 8 = 3%, which is the number of bits you'd require to generate a number between %1% and %8% inclusive.

So what?

I don't know. Stuff is neat.

(Maybe I'll show how primes fit into all this tomorrow!)

Exploring computation space

The birth of Skynet

A couple of years back, I wrote a little program named Skynet. It uses a configurable set of assembly-like instructions to spawn tiny programs, cobbled together completely at random. The original idea was to be able to feed it a set of input and output integer lists representing some function you were trying to find, and have it then blindly fumble its way through millions or billions of those micro-programs until it hit on one that satisfied your conditions.

For example, if you wanted some code that would exponentiate, you might feed it a configuration file signifying

Input 1: 2 3
Output 1: 8
Input 2: 4 3
Output 2: 256

and it would run forever until it stumbled upon a solution, or you killed it.

And it worked, and it was fun to screw around with, if you like that sort of thing. Feel free to grab it from the link above, but it was a quicky that I wrote for myself, so it might be rough to figure out. It's Turing-complete, but it doesn't do anything very clever—there's no fancy machine learning algorithms here, just straight random goodness—but you can specify a few parameters to try to guide its efforts, such as minimum and maximum program length to try, or how many instructions to execute before giving up on a microprogram as a runaway and moving on. I think it also tries to spot and discard overtly obvious dead ends and NOOPs.

Limitations

So yes, it works as intended, but the problem is that the realm of all possible computable functions is one of those seriously immense virtual universes that tend to pop up when dealing with combinatorics or math in general. You might have 16 instructions enabled at one time, but instructions at the assembly level don't accomplish a hell of a lot, and when you consider a branching factor that big, all of a sudden you're looking at %16^10% possible microprograms for even a modest 10-instruction snippet of code.

That's just over a trillion, many of which are likely to be non-halting and thus run several hundred instructions before bailing. The upshot is that to exhaustively try all of them, you're probably talking days or even months, depending on how optimized it all is. After quickly discovering the futility of systematically enumerating them, I said "fuck it" and resdesigned things with its current random approach, perfectly happy to try out 50-instruction programs if the pRNG requests it. Of course, if there were actually some program out in the ether that needed to be 50 instructions long, the odds of running into it through chance are almost nothing—you could expect to hit it around the same time the last protons and neutrons are decaying.

But I digress. A lot of simpler algorithms can be squeezed into a tractable space.

Things it's figured out:
  • Fibonacci sequence
  • factorials
  • doubling, cubing
  • for loops
  • adding pairs of input numbers on the stack
  • finding the mean of input numbers
  • determining divisibility of input numbers by 7
  • reversing order of stack
Not so much:
  • primality detection
  • prime generation
  • misc. other NP problems
  • bootstrapping into sentience
Details

Some of those successes were much easier than others. Fibonacci, for example, seems to be about the next easiest thing for it to do besides count. The other day I got GitLab CI going which I'm abusing mercilessly by having it spawn Skynet jobs instead of actually testing anything. But manually, this is how it rolls:

[12:00] ~/progs/Skynet> time ./4bit --file fib.skynet -r
fscanfs return 6111111111  
low=4 up=8 exact=0 stopAfter=1 maxSteps=50

Input 1:  
Output 1: 1 1 2 3 5 8 13  
File loaded successfully.  
Randomizing instructions...  
[ INS  SWAB  MULT  PUSH  POP  INC  ADD  JNE  NAND  DEC  PEEK  SKIP  IFF  NOT  LOCO_1  MOD ]

Satisficing program (length 5):  INC  PUSH  ADD  SWAB  JNE  -=- Tracing operation -=-


                        [init   a=0     b=0     c=0     pc=0    sp=1]           ( __ )
        INC             [1      a=1     b=0     c=0     pc=1    sp=1]           ( __ )
        PUSH            [2      a=1     b=0     c=0     pc=2    sp=2]           ( __ 1 )
        ADD             [3      a=1     b=0     c=0     pc=3    sp=2]           ( __ 1 )
        SWAB            [4      a=0     b=1     c=0     pc=4    sp=2]           ( __ 1 )
        JNE             [5      a=0     b=1     c=0     pc=0    sp=2]           ( __ 1 )
        INC             [6      a=1     b=1     c=0     pc=1    sp=2]           ( __ 1 )
        PUSH            [7      a=1     b=1     c=0     pc=2    sp=3]           ( __ 1 1 )
        ADD             [8      a=2     b=1     c=0     pc=3    sp=3]           ( __ 1 1 )
        SWAB            [9      a=1     b=2     c=0     pc=4    sp=3]           ( __ 1 1 )
        JNE             [10     a=1     b=2     c=0     pc=0    sp=3]           ( __ 1 1 )
        INC             [11     a=2     b=2     c=0     pc=1    sp=3]           ( __ 1 1 )
        PUSH            [12     a=2     b=2     c=0     pc=2    sp=4]           ( __ 1 1 2 )
        ADD             [13     a=4     b=2     c=0     pc=3    sp=4]           ( __ 1 1 2 )
        SWAB            [14     a=2     b=4     c=0     pc=4    sp=4]           ( __ 1 1 2 )
        JNE             [15     a=2     b=4     c=0     pc=0    sp=4]           ( __ 1 1 2 )
        INC             [16     a=3     b=4     c=0     pc=1    sp=4]           ( __ 1 1 2 )
        PUSH            [17     a=3     b=4     c=0     pc=2    sp=5]           ( __ 1 1 2 3 )
        ADD             [18     a=7     b=4     c=0     pc=3    sp=5]           ( __ 1 1 2 3 )
        SWAB            [19     a=4     b=7     c=0     pc=4    sp=5]           ( __ 1 1 2 3 )
        JNE             [20     a=4     b=7     c=0     pc=0    sp=5]           ( __ 1 1 2 3 )
        INC             [21     a=5     b=7     c=0     pc=1    sp=5]           ( __ 1 1 2 3 )
        PUSH            [22     a=5     b=7     c=0     pc=2    sp=6]           ( __ 1 1 2 3 5 )
        ADD             [23     a=12    b=7     c=0     pc=3    sp=6]           ( __ 1 1 2 3 5 )
        SWAB            [24     a=7     b=12    c=0     pc=4    sp=6]           ( __ 1 1 2 3 5 )
        JNE             [25     a=7     b=12    c=0     pc=0    sp=6]           ( __ 1 1 2 3 5 )
        INC             [26     a=8     b=12    c=0     pc=1    sp=6]           ( __ 1 1 2 3 5 )
        PUSH            [27     a=8     b=12    c=0     pc=2    sp=7]           ( __ 1 1 2 3 5 8 )
        ADD             [28     a=20    b=12    c=0     pc=3    sp=7]           ( __ 1 1 2 3 5 8 )
        SWAB            [29     a=12    b=20    c=0     pc=4    sp=7]           ( __ 1 1 2 3 5 8 )
        JNE             [30     a=12    b=20    c=0     pc=0    sp=7]           ( __ 1 1 2 3 5 8 )
        INC             [31     a=13    b=20    c=0     pc=1    sp=7]           ( __ 1 1 2 3 5 8 )
        PUSH            [32     a=13    b=20    c=0     pc=2    sp=8]           ( __ 1 1 2 3 5 8 13 )
        ADD             [33     a=33    b=20    c=0     pc=3    sp=8]           ( __ 1 1 2 3 5 8 13 )
        SWAB            [34     a=20    b=33    c=0     pc=4    sp=8]           ( __ 1 1 2 3 5 8 13 )
        JNE             [35     a=20    b=33    c=0     pc=0    sp=8]           ( __ 1 1 2 3 5 8 13 )
        INC             [36     a=21    b=33    c=0     pc=1    sp=8]           ( __ 1 1 2 3 5 8 13 )
        PUSH            [37     a=21    b=33    c=0     pc=2    sp=9]           ( __ 1 1 2 3 5 8 13 21 )
        ADD             [38     a=54    b=33    c=0     pc=3    sp=9]           ( __ 1 1 2 3 5 8 13 21 )
        SWAB            [39     a=33    b=54    c=0     pc=4    sp=9]           ( __ 1 1 2 3 5 8 13 21 )
        JNE             [40     a=33    b=54    c=0     pc=0    sp=9]           ( __ 1 1 2 3 5 8 13 21 )
        INC             [41     a=34    b=54    c=0     pc=1    sp=9]           ( __ 1 1 2 3 5 8 13 21 )
        PUSH            [42     a=34    b=54    c=0     pc=2    sp=10]          ( __ 1 1 2 3 5 8 13 21 34 )

                                         ...

Total=357252            Timeouts=206816         Redundant=6694


real    0m0.130s  
user    0m0.127s  
sys     0m0.002s  
[12:00] ~/progs/Skynet>

As may or may not be clear, it spits out its working program at the top, and then walks you through a trace execution.

One of my discoveries was that program length matters a lot less than I would have guessed. You can set your program length bounds to look for stuff 30-50 instructions long, and it'll still pop out a Fibonacci program after a second. This is presumably because when you throw shit randomly together, you end up with an abundance of NOOPs (at least with my instruction set.)

To put that in slightly more concrete perspective, there are one or two 5-length versions of Fibonacci that exist, out of something like a million possible programs; put another way, you could say ~0.0002% of 5-lengthers are Fibonacci programs. But as it turns out, if you go out to 20 instructions, that percentage doesn't drop by much—in fact, for some things it climbs. The point I'm trying to make, which surprised me, is that regardless of the program length you're searching for, you'll hit anything provided it's simple enough to fit in that length or simpler, and these types of algorithms seem to occupy a pretty constant market share at any depth.

Present day

(Well, yesterday. )

I discovered PyPy and Cython, things which I should have known about a long time ago but didn't. I wrote Skynet in C because it's second to none for this sort of thing, probably a 30x speed increase over the same thing in vanilla Python. But Cython lets you statically type variables, embed C code, and other wily stuff, while PyPy appears to be a magical JIT compiler that makes your pure Python run 5-10x faster (if it's this sort of project).

Anyway, I wanted to try them out, so last night I ported Skynet back to Python, yielding skython, a name which I'm half proud of but made me die a little inside to use. After I got the basics working, I started to adapt it for Cython, but realized I would effectively be porting it awkwardly back to C-ish if I continued, so I've stuck with Python and am using PyPy, which seems to run it at maybe 25% of the speed of the C version. In other words, close enough to be an acceptable trade-off. It's not up to snuff yet, but it is working, and it's certainly faster to add features than it was with C.

To wit: I was thinking about the program-market-share concept I mentioned, and it led me to add a method which creates programs randomly without looking for any input or output, but rather keeping track of everything it churns out on its own. And this is where we actually rejoin the title and original thematic intent of this post, because the idea there was to explore the distribution of algorithmic information in computation space, by which I mean seeing just how often it opts for some sequences over others when selecting out of the vast set of all computable functions.

And?

And it was interesting. I gave it some pretty neutral parameters and had it run a billion microprograms. There's lots I could have done (and perhaps will do) to clean up and solidify the results, but there was enough to get a sense of things.

A few results
  • Half the programs do nothing. Give or take.
  • Most of the rest, percentage-wise, print out endless strings of 0s or 1s.
  • The last few percent, though, starts to get interesting.
  • When you get down to 0.5% or so, you start to get 0101010... and 123456... sequences.
  • They're followed close behind by the 22222s and the 110110s and the 012012s... etc.

The first eye-catching result is down at 0.00272% computation-space-share (but this is still in the top percent of frequency):

0,1,3,7,15,31,63,127,255,511,1023,2047,4095,8191,16383,32767,65535,131071,262143,524287,1048575,2097151,4194303,8388607,16777215  

The pattern's clear enough. A little ways on, down at 0.00045%, I spot a 1, 4, 7, 10, ... along with the first Fibonacci. Down further, I recognize a pattern that came up in some math problem I worked on months ago:

0,1,2,5,10,21,42,85,170,341,682,1365,2730  

But as I went deeper, I found sequences that clearly had structure but were foreign to me:

1,3,8,20,49,119,288,696,1681,4059,9800,23660,57121,137903  

I started looking them up on OEIS, and found that most of the interesting-looking patterns were indeed in there, and at least frickin' half of them were related in some way to the Fibonacci sequence. Not that this is especially surprising or perplexing in hindsight, I'd just never given much thought to how fundamental the principle is upon which it's built. No wonder that it pops up in nature everywhere; God must have noticed the same thing. So yeah, there are some interesting patterns when you get deep, but I didn't find anything earth-shattering.

Zip 'em up!

When I got tired of that, I moved on to a test of algorithmic information theory. The expectation is that the more frequently a program appears, the less complexity it is apt to have. A quick'n'dirty test for complexity is compressibility—the more complex a thing, the less compressible, to the point where any data sufficiently close to random (which is 'perfectly' complex, in this sense) will only get bigger.

I had my big honkin' 88mb data file, sorted by output frequency, so first I split it into equal parts, going from least to most common outputs:

[13:17] ~/progs/skython> split -da 2 -n 25 1bil.txt
[13:18] ~/progs/skython> ls -l
-rw-r--r-- 1 tc tc  88516009 Aug  1 09:46 1bil.txt
-rw-r--r-- 1 tc tc   3540640 Aug  1 13:18 x00
-rw-r--r-- 1 tc tc   3540640 Aug  1 13:18 x01
-rw-r--r-- 1 tc tc   3540640 Aug  1 13:18 x02
-rw-r--r-- 1 tc tc   3540640 Aug  1 13:18 x03
-rw-r--r-- 1 tc tc   3540640 Aug  1 13:18 x04
-rw-r--r-- 1 tc tc   3540640 Aug  1 13:18 x05
-rw-r--r-- 1 tc tc   3540640 Aug  1 13:18 x06
-rw-r--r-- 1 tc tc   3540640 Aug  1 13:18 x07
-rw-r--r-- 1 tc tc   3540640 Aug  1 13:18 x08
-rw-r--r-- 1 tc tc   3540640 Aug  1 13:18 x09
-rw-r--r-- 1 tc tc   3540640 Aug  1 13:18 x10
-rw-r--r-- 1 tc tc   3540640 Aug  1 13:18 x11
-rw-r--r-- 1 tc tc   3540640 Aug  1 13:18 x12
-rw-r--r-- 1 tc tc   3540640 Aug  1 13:18 x13
-rw-r--r-- 1 tc tc   3540640 Aug  1 13:18 x14
-rw-r--r-- 1 tc tc   3540640 Aug  1 13:18 x15
-rw-r--r-- 1 tc tc   3540640 Aug  1 13:18 x16
-rw-r--r-- 1 tc tc   3540640 Aug  1 13:18 x17
-rw-r--r-- 1 tc tc   3540640 Aug  1 13:18 x18
-rw-r--r-- 1 tc tc   3540640 Aug  1 13:18 x19
-rw-r--r-- 1 tc tc   3540640 Aug  1 13:18 x20
-rw-r--r-- 1 tc tc   3540640 Aug  1 13:18 x21
-rw-r--r-- 1 tc tc   3540640 Aug  1 13:18 x22
-rw-r--r-- 1 tc tc   3540640 Aug  1 13:18 x23
-rw-r--r-- 1 tc tc   3540649 Aug  1 13:18 x24

And then I zipped 'em:

[13:18] ~/progs/skython> gzip -9 x??; ll
-rw-r--r-- 1 tc tc  88516009 Aug  1 09:46 1bil.txt
-rw-r--r-- 1 tc tc    851286 Aug  1 13:18 x00.gz
-rw-r--r-- 1 tc tc    850941 Aug  1 13:18 x01.gz
-rw-r--r-- 1 tc tc    851373 Aug  1 13:18 x02.gz
-rw-r--r-- 1 tc tc    852635 Aug  1 13:18 x03.gz
-rw-r--r-- 1 tc tc    849508 Aug  1 13:18 x04.gz
-rw-r--r-- 1 tc tc    857673 Aug  1 13:18 x05.gz
-rw-r--r-- 1 tc tc    853534 Aug  1 13:18 x06.gz
-rw-r--r-- 1 tc tc    852645 Aug  1 13:18 x07.gz
-rw-r--r-- 1 tc tc    854754 Aug  1 13:18 x08.gz
-rw-r--r-- 1 tc tc    854989 Aug  1 13:18 x09.gz
-rw-r--r-- 1 tc tc    855178 Aug  1 13:18 x10.gz
-rw-r--r-- 1 tc tc    848454 Aug  1 13:18 x11.gz
-rw-r--r-- 1 tc tc    846578 Aug  1 13:18 x12.gz
-rw-r--r-- 1 tc tc    844529 Aug  1 13:18 x13.gz
-rw-r--r-- 1 tc tc    849767 Aug  1 13:18 x14.gz
-rw-r--r-- 1 tc tc    849013 Aug  1 13:18 x15.gz
-rw-r--r-- 1 tc tc    846908 Aug  1 13:18 x16.gz
-rw-r--r-- 1 tc tc    852737 Aug  1 13:18 x17.gz
-rw-r--r-- 1 tc tc    847214 Aug  1 13:18 x18.gz
-rw-r--r-- 1 tc tc    821187 Aug  1 13:18 x19.gz
-rw-r--r-- 1 tc tc    760842 Aug  1 13:18 x20.gz
-rw-r--r-- 1 tc tc    752602 Aug  1 13:18 x21.gz
-rw-r--r-- 1 tc tc    727900 Aug  1 13:18 x22.gz
-rw-r--r-- 1 tc tc    692860 Aug  1 13:18 x23.gz
-rw-r--r-- 1 tc tc    605098 Aug  1 13:18 x24.gz

...and as expected, they all got smaller, because these were text dumps which are always gonna be highly compressible. But that aside, it's evident that gzip had a much easier time with the more common stuff. Again, no surprise, but cool. Note the implicit curve that drops off sharply until you get to the top 75 or 80% of them, where it levels off and stays effectively flat.

One last thing to check

I'm beginning to suspect that primes are inherently complex, not just from number theory but in a Kolmogorov-y, robust sense. More Linux tricks:

[10:24] ~/progs/skython> for i in {1..30}; do cut -s --delimiter=, -f $i 1bil.txt >> 1bil_vals.txt; done
[10:27] ~/progs/skython> sed 's/\(\s\|\-\)//g' 1bil_vals.txt | uniq -c | sort -gr > 1bil_sorted_abs_final.txt

which gives a sorted text file of how many times every individual term came up overall, ignoring patterns:

[10:30] ~/progs/skython> head -20 1bil_sorted_abs_final.txt 
16245701  
11338986 1  
8543839 0  
5732203 2  
3037257 3  
1661122 4  
 980248 5
 838860 6
 582802 7
 457519 8
 338796 9
 289842 10
 226633 12
 208586 11
 161242 13
 154787 14
 149209 15
 140625 16
  90908 18
  90094 17

It's my suspicion that primes will be disproportionately low on this list—as they should be, given how computers work—and a first glance confirmed that somewhat, but then I stopped to write all this.

Okay. That's it. Here's the big ugly data file if you want it. Maybe I'll write up my thoughts on an approach to %P=^{?}NP% via transfinite induction tomorrow. Peace out.