On the persistence of zeros in Rule 30

A tentative proof of the aperiodicity of the center column in Wolfram’s Rule 30 open question.

The beginning of Rule 30 in the standard configuration with center column highlighted.


  1. Instead of assigning ones and zeros, we’ll be using boolean algebra to represent cell values. This allows us to ratchet up the complexity of the system in a slightly more general way. Let $k$ be the number of contiguous boolean variables we start out with; these variables are set on top of an infinitely long background of $0$ cells (per the problem definition). We refer to these variables as seed variables, and the top row as the seed row.
  2. We show that the distribution of those boolean expressions which appear infinitely many times in a column are well-behaved with regards to the distribution of seed variables.
  3. One of the effects of this regular distribution is the guarantee of null cells in every column.
  4. We show that the presence of such a null cell—that is, a cell containing only a $0$, indicating it cannot be impacted by seed variables—is sufficient to prevent center column cycling.

1. Computing Rule 30 as boolean expressions.

The principle is simple. Rather than starting with ones and zeros, we start with an arbitrary number of variables. All subsequent calculations are carried out in the same way. The principle operation of Rule 30 is $p \otimes (q \lor r)$, which we’ll be representing in boolean algebra as $f(a,b,c)=a+b+c+bc \pmod 2$.

To give some concept of this, see the three examples below, representing 1-, 2- and 3-variable configurations respectively.

See my earlier post here for a slightly gentler approach to most of this if any of it is confusing.

Note that when carrying out the Rule 30 function in this context, we can do some simple optimizing simplification to keep things clean. Exponents can be dropped without effect, as $1^m=1$ and $0^m=0$. Terms with even coefficients can be dropped completely, as they’re modded out, and odd coefficients can be omitted, i.e. an implied $1$. This leaves you with tidy polynomials that are simply sums of products, having no constants other than the occasional $0$ when everything else cancels.

As you can see in the tables above, the 1-seed-variable version has a roughly equal number of $a$ and $0$ cells. Note that we could assign the value $1$ to $a$, and it immediately becomes a standard Rule 30 layout. (Note also that any empty cells contain $0$s as well, but are left blank for legibility.)

Moving on to the 2-variable version, I assert the expected frequency of a $0$-cell drops to $1/4$. Likewise, for the 3-variable version, it drops to $1/8$. Where it goes from there actually depends on interpretation.

2. Single-variable vs. extended polynomials

Using $a+b+c+bc$ as a naive formula explodes to intractability nearly instantaneously. We save quite a lot by employing the simplification shortcuts listed earlier, but even that’s not enough; the complexity of the full expressions, even simplified, grows doubly exponentially. Using $k$ seed variables and our simplified polynomial form, there are still $2^{2^k-1}$ possible distinct expressions that can be formed. Not all of them are used, but enough are so that we have to simplify yet again. For the record, I believe that the number of them squares every step after $k=3$, giving something along the lines of $8^{2^{k-3}}$ distinct recurring polynomials. (These polynomials are interesting in their own right; I believe each one may more or less encode the entire structure to that point.)

This is the unabridged (and simplified!) polynomial for the center column after only row 4.

Instead, we’ll accept that we’re really only interested in the case where one and only one seed variable is switched on. With that assumption, any product in those polynomials must have a $0$ and can be ignored, and only the single-variable monomial terms are of interest to us. There are several viable ways to generate these expressions; the easiest turns out to be simply running Rule 30 as usual, reversing the bit order, and swapping in seed variables for $1$ bits.

Using that, the monster expression above becomes $a_1+a_5+a_8+a_9$. So yes, a tad easier to work with.

3. The utility of a null cell.

These are not to be confused with the more familiar $0$ you’d find in a cellular automaton; the $0$s we’re concerned with are boolean expressions, in particular empty ones, and thus are much more infrequent.

Once we have our algebraic interpretation laid out, we can manipulate things to some extent. If we assign a $1$ value to any single one of our seed variables, it will effectively horizontally translate the entire structure. If we want, we can even assign the values ordinarily corresponding to a later row, which has the effect of translating the structure vertically.

For our purposes, all we need to realize is that if we encounter a $0$-expression cell, it means that no possible setting of the current seed variables can toggle this bit. Consider that, within the bounds of our seed variable sequence, we could opt to place our initial $1$ seed cell all the way to the left, or way off to the right; yet, if a cell has only $0$ in it, clearly turning on any one bit just won’t be sufficient to light up that cell. In fact, it indicates that the corresponding $k$ cells in the row with the zero are all $0$-valued, in the typical sense of $0$. This in turn indicates a “white triangle” of width at least $k$, and implies a vertical run of $0$ cells of at least $k/2$, the height of the triangle.

Putting that all together, if we are able to start with an arbitrarily large $k$ and still be guaranteed to find a null expression eventually, that directly implies an arbitrarily large run of $0$ values in the center column. For any alleged period of length $p$, we need only set $k >2p$ and it shows such a cycle cannot exist.

Note that while I believe this proof is perfectly happy finding such null expressions in the center column itself, which is ideal if possible, it would also suffice to be able to reliably find one within any fixed constant distance from the center, provided we are still able to take an arbitrarily large $k$. Again, the important aspect is to expose a large enough run of zeros that it overshadows the center column and guarantees no cycle can survive the interruption.

4. Proof of existence.

This is the part I’d been stuck on since December. Everything up to this point is all well and good, and I am quite certain it’s all correct and provable. But how can we actually be sure we’ll see these null expressions? Their existence follows from virtually any heuristic or common sense approach you can take, yet they’ve been obnoxiously resistant to any real proof in that typical number-theoretic sort of way. Here goes nothing.

It’s readily apparent that any infinitely long column will repeat some number $m$ of distinct boolean expressions infinitely many times, given that there can only be finitely many such expressions under our constraints. In fact, since we’re only dealing with single-variable addends, the maximum number of distinct boolean expressions we can see in a cell, given $k$ seed variables, is $2^k$, the power set of those seed variables. For instance, $k=3$ means there will be at most $2^3=8$ different expressions, namely $\{0,a,b,c,a+b,a+c,b+c,a+b+c\}$.

However, those are possible expressions, and we have no guarantee we’ll see all of them. What we can say right off the bat is that the amount we’ll see must increase monotonically with $k$, as adding a fresh additional variable cannot possibly lessen the number of distinct expressions we have.

Moving on: there is some set $S$ of such expressions that will repeat infinitely many times for a given column. Ideally, we would like to show that $0$ is always in these sets, and as it turns out, there might be an easy way to do that. Since I prefer explanation through example, an example:

Suppose $k=3$, meaning I have the seed variables $a,b,c$ on my top row. Suppose that to this point, $0$ still appears in $S_x$ for all columns $x$.

Let’s increment to $k=4$, adding $d$ to our top row. It’s not hard to show that every seed variable appears in every row and every column at least once. Given that, we know that one or more of our expressions in $S$ now contain $d$. There are two primary cases we need to consider:

  1. $d \in S$.
  2. $d \not\in S$.

In the first case, we have an element in $S$ which is exactly $d$, meaning the expression $d$ appears by itself in a cell infinitely many times in this column. Since any new seed variable yields new expressions in a strictly additive way, every new term in $S$ will be a logical-OR in the form $X+d$, where $X \in S$ is some pre-existing expression. For a variable like $d$ to appear on its own requires that prior to its inclusion, we have $0 \in S$, as only $0+d=d$. Thus case 1 affirms the presence of a $0$ in $S$.

There is arguably a case to be made that that most recent seed variable was somehow perfectly arranged to hit every remaining $0$ cell in that column, effectively removing it from $S$, but this is handled by the second case. Suppose that $d$ replaced all remaining $0$ cells; if so, proceed to add another seed variable, and continue with the following argument.

In the second case, there is no single $d$ added to $S$. This means that either no null cell remains, or through some dark magic, the zeros and the $d$’s never overlap.

In either case, we’re still guaranteed at least one expression containing a $d$. This one will necessarily have one or more other variables which would also trigger the cell, perhaps something like $b\lor c \lor d=b+c+d+bc+bd+cd+bcd=b+c+d$. Remember, since we are asserting only one seed variable may be turned on, we still omit products.

The issue now is that this expression, and any other multi-variable expressions, can be removed by unwinding our process using an alternate path. Suppose $S=\{a,b,c,a+b,b+c,a+d,b+c+d\}$, an impossible but illustrative example. Let’s remove seed variable $a$, something we’re perfectly allowed to do whenever we like; this has the effect of removing any element of $S$ referencing $a$, leaving $S=\{b,c,b+c,b+c+d\}$. Next we’ll remove $b$ (which we’ll assume was magically covering all $0$s for this example), leaving $S=\{0,c\}$, and finally we remove $c$, leaving $S=\{0\}$. If $0$ is our only potential expression in any column, it indicates that no seed variable can be turned on, as otherwise any variable would plow inexorably along the left and right diagonals, altering every row and column at some cell.

However, we never actually removed $d$ explicitly, meaning that by rights, we ought to be able to set it to $1$ and light everything up. This clear impossibility is our contradiction, and the implication is that any time we add a seed variable, it must be of Case 1 type, meaning it includes itself as an entire cell expression. And since that implies that a $0$ is present or was present on the previous step, and since we can carry this out multiple times, we may be assured that a $0$ is always in $S$, for any column, which is sufficient to prove that the center column never becomes periodic.

Rule 30 walls

ECA matrices

More art?

Rule 22

Rule 37

…and more.

factor pushing

Factor pushing is an idea I came up with when trying to establish an easy proof of the existence of a prime in $(n,n^2)$. I haven’t managed that yet, but it’s still on my bucket list.

To do it, you take some set of numbers, write them out in a row, and then generate new rows with a simple transform. Typically, this transform is just to additively shift the gpf (greatest prime factor) of a number to its right. Or if you prefer,

$$n_i \leftarrow n_{i} – \textrm{gpf}(n_i) + \textrm{gpf}(n_{i-1}),$$

where typically the number of columns is fixed and the values effectively wrap around.

The concept is to tease out prime values by hashing things around until they settle into place. Since the gpf of a prime number is simply that number, the transform causes primes to travel unchanged along a diagonal until they hit a composite. Moreover, there are two important unchanging properties to the whole setup:

  1. The sum of any given row is fixed.
  2. Once a column has a prime value, it will always have a prime value.
Factor pushing of the first 15 odd numbers.

In the example above, composites are highlighted blue, everything else is prime (or 1). As only composites are affected, the first change worth noticing is in column 5, corresponding to $3^2=9$. It additively loses a 3, which is emitted to the right on the next row, which leaves a 6, which combines with the incoming 7 from the left to yield 13. The upshot is that what was previously $7,9$ has been reshuffled to be $13,3$, having the same total but now both prime. Like so:


You continue the process until it repeats, which signifies either all primes, or catastrophic failure. Given a suitable configuration, it’s pretty clear this will always work, but proving it seems hard. W.r.t. the motivating idea, proving the existence of a prime in $(n,n^2)$ for any $n$, you would need only plug in the first $n$ odd integers. Since they sum to $n^2$, and there are $n$ values, then if every value is prime, at least one of those must be in that range simply because the mean value is $n$, immediately ensuring at least one value larger and one value smaller than that (barring using all $n$’s).

It turns out there are lots of neat effects you see depending on the exact sort of set you provide. Of particular note is what happens when even numbers are included.

You’ll notice values of $2$ are green and prime powers are beige. Using the standard transform, a group of odd numbers can never develop an even number: each step, an odd number will be subtracted, and an odd number will be added, leaving the parity unchanged. This is because only primes that are present as a factor in one of the values will be in play, so if no twos exist anywhere as a factor at the outset, they never will.

When even numbers are included, as in the example above, you’ll notice that those columns which start out even tend to be composite for much longer than their odd counterparts. This is because if parity is typically left unchanged, and a number has a factor of $2$, there are only two ways to get rid of it:

  1. Have a $2$ to its left.
  2. Through its random-walk-like navigation of integers, land on a prime power of $2$.

Both things will flip the parity of the number, with one important difference. If a $2$ comes in from the left, that $2$ is effectively consumed and stops traveling along, either by annihilating an existing factor of $2$ in the composite number, or if the composite is odd, being absorbed to make it even. If instead a column spontaneously finds some $2^k$, then it will be odd thereafter, but will also emit a $2$, since that’s its largest factor.

The subjective effect reminds me of particles and antiparticles. You can see in the table above how composite columns every once in a while spontaneously emit a $2$ particle after finding a prime power, which then travels along until it encounters some other even composite column, whereupon they cancel each other out. You get truly enormous sets of primes out of this under the right conditions. More interestingly to me, just by inspecting the table, you might notice that this property pairs up all the even columns: each one is the emitter or receiver of a $2$ at least once. In fact, every even number can easily be paired with exactly one other number, even in seemingly more complicated situations.

In the table above, $28$ is paired with $34$. This is because $28$ finds $2^5$, emits a $2$, which then hits column $36$, but connects backwards to $34$, which had its own particle emission. If you think of this in terms of 1s and 0s, there’s a clear unbroken line connecting the first value in each column. These lines cannot cross: any even number inside the range of a connected pair must have its own partner also in that range.

If you do render this sort of situation as 0s and 1s, and plot it, you get something like this:

This is the first however-many integers, just as in the table earlier, but simply showing black for odd values and white for even values. The pairing up is pretty clear here. Note that in this version, I’m not using a wrap-around finite sequence, but rather a finite range of values which doesn’t wrap around, but does import from the left and export to the right, so to speak. You’ll notice a few lines extending all the way to the bottom: these signify columns that are remaining composite and even for longer than was practical to calculate or render. Theoretically, however, every line must eventually terminate in a connection to some other line, somewhere. In the finite wrap-around setup, the same is true, although if you start with an odd number of even numbers, you’ll have one left over, which presents as there being a single $2$ in the resulting row of primes.

At the time I was doing this, I did play around with a number of variants for visualizing the same or similar setups. The smoother ones are from using logs on a similar approach using lpf instead of gpf; in the wide one, you can clearly see the transition between negative and positive integers. The relative brightness indicates more composite activity in the negatives, something that is even starker in the final picture, which is what it looks like if you look at factors of $3$ instead of $2$.

h vs. v

default rule 30

1. horizontal union

2. vertical union

Proof summary for aperiodicity of Rule 30

This is rule 30.

Standard practice is to consider black cells to have value 1, and white cells to have 0.

The center column is highlighted, as we are interested in determining whether or not it’s possible for it to ever degenerate into a cyclic pattern.

Today, I have a pretty good outline for how to show that it cannot.

Rule 30 as boolean algebra

We can always substitute variables in place of the hard 1s and 0s you usually deal with in cellular automata.

1 variable

Here’s one of the simplest replacements we could do.

The empty spaces should be filled by 0s too, but are left blank for clarity.

Here, we merely replaced the single cell that starts all this, the \(1\) at the top, with $a$. That $a$ is propagated through the system exactly how the $1$ would be. Note that it would also be legitimate (if boring) to let $a=0$, in which case we’re left with an empty universe.

2 variables

If we add another variable, things begin to get interesting:

If we let $a=0,b=1$, then $b$ becomes the center column exactly as before. If we set $a=1$, however, it becomes the de facto center column*, regardless of the value of $b$. For a while, I thought there may be a trick along those lines to show that neither one could therefore be cyclic, but I wasn’t able to find anything, other than a conclusion that the appearances of $a$ and $b$ must be aperiodic even if the resulting pattern is not.

Distinct cell expressions
for 2 variables.

Apart from that, the situation is also getting slightly messier. We now have several values we see coming up repeatedly, listed in the table to the right. This is our first look at something that will become critical later, that new expressions are generally built by pasting two different sets together.

* We don’t mean literal center column here, but rather that it will contain the same pattern found in the typical Rule 30 configuration with a single $1$ cell in the center column seeding the structure.


Incidentally, without simplification, an algebraic approach would be immediately intractable. If all we do is apply $f(a,b,c):=a+b+c+bc$, we can see the problem after only a couple of steps, in the same 2-variable configuration used above:

To limit the combinatorial explosion, there are two simple optimizations we can make since we’re dealing with boolean values. First, all exponents can be dropped, since $0^m=0$ and $1^m=1$ for any $m$. Second, we drop any terms with even coefficients, and strip the odd coefficients, which takes our expressions most of the way toward$\pmod{2}$ as they’re meant to be. Alas, even after all that, we’ll still face doubly exponential growth.

3 variables

When we add a third variable, things move into full gear.

Note all the options here: we could make a, b, or c the center column by setting one of them to 1… or by setting all three to 1,
we’re effectively shifting forward time by one step, since 111 is the second row following a single start cell.
Distinct cell expressions for 3 variables.

Although there are clearly a lot of new expressions here, not all possible expressions are represented. For $k$ variables, able to combine in up to $2^k$ additive terms, each of which may itself include or not include $k$ variables, the number of raw possible expressions grows doubly exponentially as $2^{2^k}$. That growth rate has not been the case so far, but it will be from here on out, presumably a result of reaching the three operands that Rule 30 expects. As a result, the number of distinct algebraic expressions quickly becomes unmanageable even after our simplifications.

Fortunately, we won’t need to manage it. There’s an invariant property that readily scales with this, and that’s what we’ll be using here.

Continuing the pattern

Total additive term count
from 4-variable
distinct expressions.
Total additive term count
from 3-variable
distinct expressions.

There are a few expressions that do appear that we’ll be ignoring, but this is because they’re either part of the cycling diagonals off to the side, which doesn’t help us, or because it’s one of the first few entries in the column, which don’t actually repeat. (Interestingly, any expression in the form $a+b+c+bc$ falls into this category, and only appears once.) Given that, let’s focus only on expressions that appear on row $n\geq 3$, in the center column.

# variables ($k$)distinct cell expressions

For all $k \geq 3$, if you process sufficiently many rows to see every legal expression at least once, you’ll find that within those $2^{3\cdot 2^{k-3}}$ distinct expressions, if you chop up the polynomials and take a raw count of all monomial terms, every possible term appears, and each one appears in precisely one-half of all expressions. The only exception to this is the null-set-expression $0$, which may occur as infrequently as $1$ in $2^{2^k}$ cells (but needs further research), and any triplets like $bcd, cde, def, \ldots$, consisting of exactly three distinct consecutive in-range variables. There are only $k-2$ of these, and they appear in only a quarter of the overall list.

But back to the main story here. Yes, they all share very equitably, and in fact, it’s so even that if you remove every expression containing any given monomial or subexpression, the count of all remaining monomials is exactly halved, suggesting an extremely organized distribution.

Take the 4-variable case as an example, which will have $64$ distinct expressions. This means that $32$ of those will have $a$ as a monomial, and $16$ of the other $32$ will have $b$ as a monomial, and $8$ of the remaining $16$ will have $c$ as a monomial, and finally $4$ of those last $8$ will have $d$ as a monomial (with no $a$, $b$, or $c$). That leaves $4$ expressions to be accounted for; of those, $0$ uses one slot, and the other three will all have all multivariable monomials.

Any of those final four expressions is indicative of a vertical run of $0$s of width at least $4$ and height at least $2$. This principle can be extended to any number of variables to show arbitrarily large contiguous spans of $0$, disallowing any periodicity.

Ratcheting up complexity

The tables we’re building are additive in the sense that, upon adding a new variable to the seed row, no existing addends will ever be removed as a result. This property gives us a welcome measure of stability. Instead, those cells that do change values can do so only by taking on additional additive terms, all of which must be divisible by the new variable.

Given that, let’s look at the actual driving force behind the complexity generation: the multiplicative operation. In fairness, the real source of the complexity is the interweaving of multiplication and addition (as is true in literally all things), but combinatorially speaking, the multiplication is the killer.

The additions are able to mix and match terms to some extent, and thus are half of that double exponential, but the multiplications are able to glue together what effectively become new proxy variables which must be independently tracked downstream, amplifying the capacity of the additions to swirl together novel expressions. In particular, every new seed variable appears to permeate the system sufficiently such that all $2^k$ possible terms appear.

That said, it’s worth repeating that although all terms appear, not all possible expressions do. As mentioned, the general $a+b+c+b c$ form is a one-and-done, and also worth special attention is the lack of solitary single variable expressions: $a$ is the only one that you’ll see, discounting degenerate cycles on the diagonals. For any variable past $a$, you will never see it on its own after being introduced; in every expression where it appears as a term, it’s always part of a larger whole, e.g. you’ll see $c+ac+bc$, at the very least. This is a general pattern that seems to hold: $d$ never appears on its own but requires at least $d+bd+cd$, and so forth. Close examination of the genesis of these terms in the tables below makes some sense of this behavior.

Although these tables are obviously becoming unwieldy, here are the beginnings of the 4- and 5-variable progressions.

Without serious optimization, identifying the distinct expressions for the 5-variable case is right at the limit of what my computer can reasonably handle using 64gb of memory and a day of computing time. I suspect that no amount of optimization would allow brute force verification of the 6-variable case without special hardware. So while on the one hand, I am basing my theory on an extremely limited data set, the specific nature of the data strongly suggests to me that the patterns identified likely hold at any level, but to prove it, I still need to work out a more compelling theory of how those exact properties do hold.

Mechanism of operation

Distinct cell expressions for 3 variables.

Consider the progression when starting with $\{a,b,c\}$. You get $8$ distinct expressions from that; in fact, we’ll pop up that table now.

Now consider the progression when starting with $\{b,c,d\}$. By this, I mean with $b$ and $c$ in their usual columns, and padded by $0$ as always, so yes: this is isomorphic to the $\{a,b,c\}$ progression. See, I’m attempting to lay conceptual groundwork for picturing them coming together from their own individual starting points, so we can think through what happens when we combine them as with $\{a,b,c,d\}$.

It should be obvious that $\{b,c,d\}$ on its own will also generate eight unique expressions, and essentially the same ones that are given here. To determine them exactly, it generally suffices to simply shift all the variables forward or backward. Doing this, we find that all of the expressions for $\{b,c,d\}$ will be different (excepting $0$).

Conceivably, the merging process between these two sets will somehow come to settle on $8^2=64$ distinct expressions, highly suggestive of pairing, but it’s unclear how to get more precise than that. And thus I guess we’ve gotten to where I’m stuck.

I’m probably better off sticking with the inductive-like approach of considering what happens when you add one more seed variable. The even splitting of terms completely regularly between all expressions, as well as the number of expressions itself simply repeatedly squaring, both scream that there is a very orderly process behind this part of things, but I can’t put my finger on it if so. And I wish I could, because a proof immediate follows if I could show this distribution holds, or even a decent tangential result.

Variable displacement

When $b=1$, the center column must be periodic. This means that all cells not dependent on $b$ must be periodic regardless of anything else.

When $b=0$, then $a$ controls whether the left or center column behaves as the typical center column. If $a=0$, then the center is the center and must be periodic; if $a=1$, then the left column becomes the center and must be periodic.