More art?

…and more.

## factor pushing

Factor pushing is an idea I came up with when trying to establish an easy proof of the existence of a prime in $(n,n^2)$. I haven’t managed that yet, but it’s still on my bucket list.

To do it, you take some set of numbers, write them out in a row, and then generate new rows with a simple transform. Typically, this transform is just to additively shift the gpf (greatest prime factor) of a number to its right. Or if you prefer,

$$n_i \leftarrow n_{i} – \textrm{gpf}(n_i) + \textrm{gpf}(n_{i-1}),$$

where typically the number of columns is fixed and the values effectively wrap around.

The concept is to tease out prime values by hashing things around until they settle into place. Since the gpf of a prime number is simply that number, the transform causes primes to travel unchanged along a diagonal until they hit a composite. Moreover, there are two important unchanging properties to the whole setup:

1. The sum of any given row is fixed.
2. Once a column has a prime value, it will always have a prime value.

In the example above, composites are highlighted blue, everything else is prime (or 1). As only composites are affected, the first change worth noticing is in column 5, corresponding to $3^2=9$. It additively loses a 3, which is emitted to the right on the next row, which leaves a 6, which combines with the incoming 7 from the left to yield 13. The upshot is that what was previously $7,9$ has been reshuffled to be $13,3$, having the same total but now both prime. Like so:

You continue the process until it repeats, which signifies either all primes, or catastrophic failure. Given a suitable configuration, it’s pretty clear this will always work, but proving it seems hard. W.r.t. the motivating idea, proving the existence of a prime in $(n,n^2)$ for any $n$, you would need only plug in the first $n$ odd integers. Since they sum to $n^2$, and there are $n$ values, then if every value is prime, at least one of those must be in that range simply because the mean value is $n$, immediately ensuring at least one value larger and one value smaller than that (barring using all $n$’s).

It turns out there are lots of neat effects you see depending on the exact sort of set you provide. Of particular note is what happens when even numbers are included.

You’ll notice values of $2$ are green and prime powers are beige. Using the standard transform, a group of odd numbers can never develop an even number: each step, an odd number will be subtracted, and an odd number will be added, leaving the parity unchanged. This is because only primes that are present as a factor in one of the values will be in play, so if no twos exist anywhere as a factor at the outset, they never will.

When even numbers are included, as in the example above, you’ll notice that those columns which start out even tend to be composite for much longer than their odd counterparts. This is because if parity is typically left unchanged, and a number has a factor of $2$, there are only two ways to get rid of it:

1. Have a $2$ to its left.
2. Through its random-walk-like navigation of integers, land on a prime power of $2$.

Both things will flip the parity of the number, with one important difference. If a $2$ comes in from the left, that $2$ is effectively consumed and stops traveling along, either by annihilating an existing factor of $2$ in the composite number, or if the composite is odd, being absorbed to make it even. If instead a column spontaneously finds some $2^k$, then it will be odd thereafter, but will also emit a $2$, since that’s its largest factor.

The subjective effect reminds me of particles and antiparticles. You can see in the table above how composite columns every once in a while spontaneously emit a $2$ particle after finding a prime power, which then travels along until it encounters some other even composite column, whereupon they cancel each other out. You get truly enormous sets of primes out of this under the right conditions. More interestingly to me, just by inspecting the table, you might notice that this property pairs up all the even columns: each one is the emitter or receiver of a $2$ at least once. In fact, every even number can easily be paired with exactly one other number, even in seemingly more complicated situations.

In the table above, $28$ is paired with $34$. This is because $28$ finds $2^5$, emits a $2$, which then hits column $36$, but connects backwards to $34$, which had its own particle emission. If you think of this in terms of 1s and 0s, there’s a clear unbroken line connecting the first value in each column. These lines cannot cross: any even number inside the range of a connected pair must have its own partner also in that range.

If you do render this sort of situation as 0s and 1s, and plot it, you get something like this:

This is the first however-many integers, just as in the table earlier, but simply showing black for odd values and white for even values. The pairing up is pretty clear here. Note that in this version, I’m not using a wrap-around finite sequence, but rather a finite range of values which doesn’t wrap around, but does import from the left and export to the right, so to speak. You’ll notice a few lines extending all the way to the bottom: these signify columns that are remaining composite and even for longer than was practical to calculate or render. Theoretically, however, every line must eventually terminate in a connection to some other line, somewhere. In the finite wrap-around setup, the same is true, although if you start with an odd number of even numbers, you’ll have one left over, which presents as there being a single $2$ in the resulting row of primes.

At the time I was doing this, I did play around with a number of variants for visualizing the same or similar setups. The smoother ones are from using logs on a similar approach using lpf instead of gpf; in the wide one, you can clearly see the transition between negative and positive integers. The relative brightness indicates more composite activity in the negatives, something that is even starker in the final picture, which is what it looks like if you look at factors of $3$ instead of $2$.

## On the persistence of zeros

I think I have a proof of the aperiodicity of the center column in Wolfram’s Rule 30 open question.

#### Outline

1. Instead of assigning ones and zeros, we’ll be using boolean algebra to represent cell values. This allows us to ratchet up the complexity of the system in a more understandable way. We let $k$ be the number of contiguous variables we start out with, also known as our seed row.
2. We show that the presence of a pure zero is sufficient to prevent cycling.
3. We demonstrate by induction that adding a variable to our seed row cannot result in a cycle.

#### 1. Computing Rule 30 as boolean expressions.

The principle is simple. Rather than starting with ones and zeros, we start with an arbitrary number of variables. All subsequent calculations are carried out in the same way. The principle operation is still $f(a,b,c)=a+b+c+bc \pmod 2$.

To give some concept of this, see the three examples below, representing 1-, 2- and 3-variable configurations respectively.

Note that when carrying out the Rule 30 function in this context, we can do some simple optimizing simplification to keep things clean. Exponents can be dropped without effect, as $1^m=1$ and $0^m=0$. Terms with even coefficients can be dropped completely, as they’re modded out, and odd coefficients can be omitted, i.e. an implied $1$.

#### 2. The utility of zero.

This is not to be confused with the more typical $0$ you’d find in a cellular automaton; the $0$s we’re concerned with are boolean expressions, and thus are much more infrequent.

Well, not to start. As you can see in the tables above, the 1-variable version has a roughly equal number of $a$ and $0$ cells. Moving on to the 2-variable version, I believe the expected frequency drops to $1/4$. For the 3-variable version, it drops to $1/8$, and the dropoff becomes steeper from there. At best the expectation for a pure $0$ goes as $2^k$, although I’m still working on whether it could actually be doubly exponential. Fortunately, that’s a detail which doesn’t impact the proof.

We can assign a $1$ value to any single one of our seed variables and effectively horizontally translate the entire structure. If we want, we can even assign values corresponding to a later row, which has the effect of translating the structure vertically. In this case, all we need to realize is that if we encounter a $0$-expression cell, it means that no possible setting of the current seed variables can toggle this bit.

Since we could place our initial $1$ seed cell way to the left, or way to the right, yet have it make no difference, it indicates that the corresponding $k$ cells in the row with the zero are all $0$-valued, in the typical sense. This in turn indicates a “white triangle” of width at least $k$, and implies a vertical run of $0$ cells of at least $k/2$, the height of the triangle.

If we are able to start with an arbitrarily large $k$ and still be guaranteed to find a $0$ (or “null”) expression eventually, that directly implies an arbitrarily large run of $0$ values in the center column. For any alleged period of length $p$, we need only set $k >2p$ and we show such a cycle cannot exist.

#### 3. Proof of existence.

Inspection of boolean tables like the ones above make it clear that when adding one variable at a time to the end of the seed row, the behavior of the resulting values is predictable in some ways. One way to interpret things is that cells may be defined in terms of their left and right neighbors; when adding a variable to the end of the seed row, it equates to values flowing to the right, roughly speaking.

This means that a null expression can only take on some other value if the cell to its immediate left has some other value; to be more clear, upon the addition of a seed variable, a null expression cannot change if the cell to its left is also null.

All we need to show is that there is at least one of these null expressions with a buddy to the left. So here, we use a sort of infinite descent argument. For us to lose all of our null expressions in the column would require that the column to our left had lost all of its nulls after adding the previous seed variable; the cells’ content is additive when treated in this way, and so the same pattern of cells will be filled in each new column.

## Proof summary for aperiodicity of Rule 30

#### This is rule 30.

The center column is highlighted, as we are interested in determining whether or not it’s possible for it to ever degenerate into a cyclic pattern.

Today, I have a pretty good outline for how to show that it cannot.

### Rule 30 as boolean algebra

We can always substitute variables in place of the hard 1s and 0s you usually deal with in cellular automata.

#### 1 variable

Here’s one of the simplest replacements we could do.

Here, we merely replaced the single cell that starts all this, the $$1$$ at the top, with $a$. That $a$ is propagated through the system exactly how the $1$ would be. Note that it would also be legitimate (if boring) to let $a=0$, in which case we’re left with an empty universe.

#### 2 variables

If we add another variable, things begin to get interesting:

If we let $a=0,b=1$, then $b$ becomes the center column exactly as before. If we set $a=1$, however, it becomes the de facto center column*, regardless of the value of $b$. For a while, I thought there may be a trick along those lines to show that neither one could therefore be cyclic, but I wasn’t able to find anything, other than a conclusion that the appearances of $a$ and $b$ must be aperiodic even if the resulting pattern is not.

Apart from that, the situation is also getting slightly messier. We now have several values we see coming up repeatedly, listed in the table to the right. This is our first look at something that will become critical later, that new expressions are generally built by pasting two different sets together.

* We don’t mean literal center column here, but rather that it will contain the same pattern found in the typical Rule 30 configuration with a single $1$ cell in the center column seeding the structure.

#### K.I.S.S.

Incidentally, without simplification, an algebraic approach would be immediately intractable. If all we do is apply $f(a,b,c):=a+b+c+bc$, we can see the problem after only a couple of steps, in the same 2-variable configuration used above:

To limit the combinatorial explosion, there are two simple optimizations we can make since we’re dealing with boolean values. First, all exponents can be dropped, since $0^m=0$ and $1^m=1$ for any $m$. Second, we drop any terms with even coefficients, and strip the odd coefficients, which takes our expressions most of the way toward$\pmod{2}$ as they’re meant to be. Alas, even after all that, we’ll still face doubly exponential growth.

#### 3 variables

When we add a third variable, things move into full gear.

Although there are clearly a lot of new expressions here, not all possible expressions are represented. For $k$ variables, able to combine in up to $2^k$ additive terms, each of which may itself include or not include $k$ variables, the number of raw possible expressions grows doubly exponentially as $2^{2^k}$. That growth rate has not been the case so far, but it will be from here on out, presumably a result of reaching the three operands that Rule 30 expects. As a result, the number of distinct algebraic expressions quickly becomes unmanageable even after our simplifications.

Fortunately, we won’t need to manage it. There’s an invariant property that readily scales with this, and that’s what we’ll be using here.

### Continuing the pattern

There are a few expressions that do appear that we’ll be ignoring, but this is because they’re either part of the cycling diagonals off to the side, which doesn’t help us, or because it’s one of the first few entries in the column, which don’t actually repeat. (Interestingly, any expression in the form $a+b+c+bc$ falls into this category, and only appears once.) Given that, let’s focus only on expressions that appear on row $n\geq 3$, in the center column.

For all $k \geq 3$, if you process sufficiently many rows to see every legal expression at least once, you’ll find that within those $2^{3\cdot 2^{k-3}}$ distinct expressions, if you chop up the polynomials and take a raw count of all monomial terms, every possible term appears, and each one appears in precisely one-half of all expressions. The only exception to this is the null-set-expression $0$, which may occur as infrequently as $1$ in $2^{2^k}$ cells (but needs further research), and any triplets like $bcd, cde, def, \ldots$, consisting of exactly three distinct consecutive in-range variables. There are only $k-2$ of these, and they appear in only a quarter of the overall list.

But back to the main story here. Yes, they all share very equitably, and in fact, it’s so even that if you remove every expression containing any given monomial or subexpression, the count of all remaining monomials is exactly halved, suggesting an extremely organized distribution.

Take the 4-variable case as an example, which will have $64$ distinct expressions. This means that $32$ of those will have $a$ as a monomial, and $16$ of the other $32$ will have $b$ as a monomial, and $8$ of the remaining $16$ will have $c$ as a monomial, and finally $4$ of those last $8$ will have $d$ as a monomial (with no $a$, $b$, or $c$). That leaves $4$ expressions to be accounted for; of those, $0$ uses one slot, and the other three will all have all multivariable monomials.

Any of those final four expressions is indicative of a vertical run of $0$s of width at least $4$ and height at least $2$. This principle can be extended to any number of variables to show arbitrarily large contiguous spans of $0$, disallowing any periodicity.

#### Ratcheting up complexity

The tables we’re building are additive in the sense that, upon adding a new variable to the seed row, no existing addends will ever be removed as a result. This property gives us a welcome measure of stability. Instead, those cells that do change values can do so only by taking on additional additive terms, all of which must be divisible by the new variable.

Given that, let’s look at the actual driving force behind the complexity generation: the multiplicative operation. In fairness, the real source of the complexity is the interweaving of multiplication and addition (as is true in literally all things), but combinatorially speaking, the multiplication is the killer.

The additions are able to mix and match terms to some extent, and thus are half of that double exponential, but the multiplications are able to glue together what effectively become new proxy variables which must be independently tracked downstream, amplifying the capacity of the additions to swirl together novel expressions. In particular, every new seed variable appears to permeate the system sufficiently such that all $2^k$ possible terms appear.

That said, it’s worth repeating that although all terms appear, not all possible expressions do. As mentioned, the general $a+b+c+b c$ form is a one-and-done, and also worth special attention is the lack of solitary single variable expressions: $a$ is the only one that you’ll see, discounting degenerate cycles on the diagonals. For any variable past $a$, you will never see it on its own after being introduced; in every expression where it appears as a term, it’s always part of a larger whole, e.g. you’ll see $c+ac+bc$, at the very least. This is a general pattern that seems to hold: $d$ never appears on its own but requires at least $d+bd+cd$, and so forth. Close examination of the genesis of these terms in the tables below makes some sense of this behavior.

Although these tables are obviously becoming unwieldy, here are the beginnings of the 4- and 5-variable progressions.

Without serious optimization, identifying the distinct expressions for the 5-variable case is right at the limit of what my computer can reasonably handle using 64gb of memory and a day of computing time. I suspect that no amount of optimization would allow brute force verification of the 6-variable case without special hardware. So while on the one hand, I am basing my theory on an extremely limited data set, the specific nature of the data strongly suggests to me that the patterns identified likely hold at any level, but to prove it, I still need to work out a more compelling theory of how those exact properties do hold.

### Mechanism of operation

Consider the progression when starting with $\{a,b,c\}$. You get $8$ distinct expressions from that; in fact, we’ll pop up that table now.

Now consider the progression when starting with $\{b,c,d\}$. By this, I mean with $b$ and $c$ in their usual columns, and padded by $0$ as always, so yes: this is isomorphic to the $\{a,b,c\}$ progression. See, I’m attempting to lay conceptual groundwork for picturing them coming together from their own individual starting points, so we can think through what happens when we combine them as with $\{a,b,c,d\}$.

It should be obvious that $\{b,c,d\}$ on its own will also generate eight unique expressions, and essentially the same ones that are given here. To determine them exactly, it generally suffices to simply shift all the variables forward or backward. Doing this, we find that all of the expressions for $\{b,c,d\}$ will be different (excepting $0$).

Conceivably, the merging process between these two sets will somehow come to settle on $8^2=64$ distinct expressions, highly suggestive of pairing, but it’s unclear how to get more precise than that. And thus I guess we’ve gotten to where I’m stuck.

I’m probably better off sticking with the inductive-like approach of considering what happens when you add one more seed variable. The even splitting of terms completely regularly between all expressions, as well as the number of expressions itself simply repeatedly squaring, both scream that there is a very orderly process behind this part of things, but I can’t put my finger on it if so. And I wish I could, because a proof immediate follows if I could show this distribution holds, or even a decent tangential result.

## Variable displacement

When $b=1$, the center column must be periodic. This means that all cells not dependent on $b$ must be periodic regardless of anything else.

When $b=0$, then $a$ controls whether the left or center column behaves as the typical center column. If $a=0$, then the center is the center and must be periodic; if $a=1$, then the left column becomes the center and must be periodic.